Real Analysis. Armin Rainer

Transcription

1 Real Analysis Armin Rainer Fakultät für Mathematik, Universität Wien, Oskar-Morgenstern- Platz 1, A-19 Wien, Austria address:

2 Preface These are lecture notes for the course Reelle Analysis held in Vienna in Spring 214 and 216 (two semester hours). The main sources are [1], [3], [5], [6], [8], [1], [11], [12], [13], and [14].

3 Contents Preface ii Chapter 1. Basic measure theory σ-algebras and measures Monotone class theorem and uniqueness of measures Outer measures and Caratheodory s construction Complete measures 5 Chapter 2. Lebesgue measure on R n Construction of the Lebesgue measure Radon measures on R n Properties of the Lebesgue measure Non-measurable sets 14 Chapter 3. Integration Measurable functions Approximation by simple functions Integration on a measure space Fubini s theorem Transformation of measures and integrals Integrals depending on parameters Relation to the Riemann integral Hausdorff measure 35 Chapter 4. L p -spaces Definition of L p -spaces Inequalities Completeness Convolution and approximation by smooth functions Modes of convergence The distribution function 5 Chapter 5. Absolute continuity of measures Complex measures Absolute continuity and decomposition of measures 55 Chapter 6. Differentiation and integration The Lebesgue differentiation theorem Derivatives of measures The fundamental theorem of calculus Rademacher s theorem 65 Chapter 7. The dual of L p The dual of L p Weak convergence 71 iii

4 iv CONTENTS 7.3. Interpolation theorems 72 Chapter 8. The Fourier transform The Fourier transform on L The Fourier transform on L Paley Wiener theorems 85 Appendix A. Appendix 91 A.1. Basic set-theoretic operations 91 A.2. Banach spaces 91 A.3. Hilbert spaces 93 A.4. Fréchet spaces 96 Bibliography 97 Index 99

5 CHAPTER 1 Basic measure theory 1.1. σ-algebras and measures Let be a set. A collection S P() of subsets of is called a σ-algebra if the following are satisfied: If A S, then A c = \ A S. If {A i } is a countable family of sets in S, then A i S. S. It is immediate from this definition that S. If {A i } is a countable family of sets in S, then A i S. If A 1, A 2 S, then A 1 \ A 2 S. Evidently, for any set, the collections {, } and P() form σ-algebras, respectively. Given any family of subsets A P() the intersection of all σ-algebras containing A is a σ-algebra. It is the smallest σ-algebra containing A and is called the σ-algebra generated by A. Let be a topological space. The σ-algebra B() generated by all open subsets in is called the σ-algebra of Borel sets in, or Borel σ-algebra. The Borel σ-algebra B(R n ) is generated by the open balls in R n. It contains all closed sets, but not all subsets of R n. A (positive) measure µ on a σ-algebra S is a mapping µ : S [, ] with the following properties: µ( ) = µ is σ-additive, i.e., if {A i } is a countable family of disjoint sets in S, then ( ) µ A i = µ(a i ). Lemma 1.1. Let µ be a measure on a σ-algebra S, and let A i S. Then: (1) µ is finitely additive, i.e., for finite families of disjoint sets {A i } m, ( m ) m µ A i = µ(a i ), (2) µ is monotone, i.e., µ(a 1 ) µ(a 2 ) if A 1 A 2. (3) If A 1 A 2, then ( lim µ(a j) = µ A i ). j (4) If A 1 A 2 and µ(a 1 ) <, then ( lim µ(a j) = µ A i ). j 1

6 2 1. BASIC MEASURE THEORY Proof. (1) follows immediately from the definition of measure. (2) We have A 2 = A 1 (A 2 \ A 1 ) and so µ(a 2 ) = µ(a 1 ) + µ(a 2 \ A 1 ) µ(a 1 ). (3) Setting B i := A i \ A i 1, i 2, and B 1 := A 1, we obtain a sequence of disjoint sets B i S so that m A i = m j=1 B j, for all m N { }. Thus ( ) ( ) m µ A i = µ B j = µ(b j ) = lim µ(b j ) j=1 j=1 m j=1 ( = lim µ m ) B j = lim µ(a m). m m j=1 (4) We have A i = A 1 \ j=1 (A 1 \ A j ), and thus, by (3), ( ) ( µ A i ) = µ(a 1 ) µ j=1 (A 1 \ A j ) = µ(a 1 ) lim i µ(a 1 \ A i ) = lim i µ(a i ). A measure space is a triple (, S, µ) consisting of a set, a σ-algebra S on, and a measure µ on S. The elements of S are called (µ-)measurable sets. If S, then we may define the measure subspace (, S, µ ), where S := {A : A S and A } = {A : A S} and µ := µ S. A measure µ is called finite if µ() <, and probability measure if µ() = 1. It is called σ-finite if there exists a sequence i S such that µ( i ) < for all i and = i; note that the i can be chosen disjoint by setting i = i \ i 1 k=1 k. We say that µ has the finite subset property if for each A S with µ(a) > there is B S with B A and < µ(b) <. A σ-finite measure has the finite subset property; if A S with µ(a) > then for some i we have < µ(a i ) <. Example 1.2. (1) For any set we may take the σ-algebra P() of all subsets and consider the counting measure { A if A is finite µ(a) = if A is infinite. (2) If is a topological space and µ is a measure on the Borel σ-algebra, then µ is called a Borel measure. (3) Fix a point x R n. Then the Dirac δ-measure δ x defined by { 1 if x A δ x (A) = χ A (x) = if x A is a measure defined on the Borel σ-algebra or even on P(R n ) Monotone class theorem and uniqueness of measures Let be a set. A collection A P() of subsets of is called an algebra if A and, for every A, B A, also A c A and A B A. A collection M P() of subsets of is called an monotone class if, for A i M, we have: If A 1 A 2, then A i M. If A 1 A 2, then A i M. Clearly, P() is a monotone class.

7 1.2. MONOTONE CLASS THEOREM AND UNIQUENESS OF MEASURES 3 Theorem 1.3 (Monotone class theorem). Let A be an algebra of subsets of. Then there exists a smallest monotone class M that contains A, and M is the σ-algebra generated by A. Proof. Let M be the intersection of all monotone classes that contain A. Then M is a monotone class that contains A, and by definition it is the smallest. In order to show that M is the σ-algebra generated by A, it suffices to prove that M is closed under complements and finite unions. Indeed, assuming this, we may conclude that, if A i M then B n := n A i M and B 1 B 2 and hence A i = n=1 B n M. Thus M is a σ-algebra. Since any σ-algebra is a monotone class, M is the smallest σ-algebra that contains A, i.e., the σ-algebra generated by A. Let us show that M is closed under finite unions. Fix A M and consider C(A) := {B M : A B M}. Let B i C(A) so that B 1 B 2. Then (A B 1 ) (A B 2 ) is a sequence in M, hence A B i = (A B i) M, and so B i C(A). Similarly, the intersection of a decreasing sequence of sets in C(A) belongs to C(A). Thus C(A) is a monotone class. If A A, then A C(A) M, since A is an algebra, and thus C(A) = M. If A M is arbitrary, then A C(A), for if B A then C(B) = M, by the previous sentence, and hence A B M. Thus C(A) = M for each A M, that means that M is closed under finite unions. In order to prove that M is closed under complements, we consider C := {B M : B c M}. Since A is an algebra, A C. If B i C so that B 1 B 2, then Bi c M and Bc 1 B2 c, and hence ( B c i) = Bc i M. Similarly, the intersection of a decreasing sequence of sets in C belongs to C. It follows that C = M. The proof is complete. Theorem 1.4 (Uniqueness of measures). Let A be an algebra of subsets of and let S be the σ-algebra generated by A. Let µ 1 and µ 2 be measures on S that coincide on A. Suppose that there is a sequence of sets A i A so that µ 1 (A i ) = µ 2 (A i ) <, i 1, and A i =. Then µ 1 = µ 2 on S. Proof. First we assume that µ 1 () <. Lemma 1.1 implies that M := {A S : µ 1 (A) = µ 2 (A)} is a monotone class; ( ) ( µ 1 A i = lim µ 1(A j ) = lim µ ) 2(A j ) = µ 2 A i if A i A i+1 j j µ 1 ( A i ) = lim j µ 1(A j ) = lim j µ 2(A j ) = µ 2 ( A i ) if A i A i+1. By Theorem 1.3, we can conclude that M = S which gives the assertion. For the case µ 1 () =, note that, for each A A, A S is the σ-algebra (on A) generated by A A (exercise!). Thus µ 1 (A B) = µ 2 (A B) for all B S if µ 1 (A) <, by the finite case. By assumption, = A i for sets A i A so that µ 1 (A i ) = µ 2 (A i ) <. Without loss of generality we may assume that the A i are disjoint. Then, for B S, ( ) µ 1 (B) = µ 1 (A i B) = µ 1 (A i B) = µ 2 (A i B) = µ 2 (B). An elementary family E is a collection of subsets of satisfying E, if E, F E then E F E,

8 4 1. BASIC MEASURE THEORY if E E, then E c is a finite disjoint union of elements in E. Proposition 1.5. The collection A of finite disjoint unions of elements in an elementary family E forms an algebra. Proof. Suppose that A, B E and B c = n C i, where C i E are disjoint. Then A\B = n (A C i) A and A B = (A\B) B A, since these unions are disjoint. By induction, we can conclude that if A 1,..., A n E then n A i A. For, by inductive hypothesis we may assume that A 1,..., A n 1 are disjoint, and then n A i = A n n 1 (A i \ A n ) A. Thus if A, B A then A B A. Let us show that A is stable under complements. Let A 1,..., A n E and A c i = m i j=1 B ij with B ij E disjoint for all i, j. Then ( n ) c n A i = which belongs to A. m i j=1 B ij = 1 j i m i 1 i n B 1j1 B njn 1.3. Outer measures and Caratheodory s construction An outer measure on a set is a mapping µ : P() [, ] satisfying: µ( ) =. µ is monotone, i.e., µ(a) µ(b) if A B. µ is σ-subadditive, i.e., for any countable family {A i } of sets A i, ( µ A i ) µ(a i ). Theorem 1.6 (Caratheodory). Let µ be an outer measure on. Set S := {E P() : µ(a) = µ(a E) + µ(a \ E) for every A }. Then S is a σ-algebra and (, S, µ S ) is a measure space. Proof. Clearly, S. If E S then E c S, since, for every A, µ(a E c ) + µ(a \ E c ) = µ(a \ E) + µ(a E) = µ(a). Next we claim that, for E, F S, also E F S. Indeed, for every A, µ(a (E F )) + µ(a \ (E F )) = µ(a (E F ) E) + µ((a (E F )) \ E) + µ(a \ (E F )) = µ(a E) + µ((a \ E) F ) + µ((a \ E) \ F ) = µ(a E) + µ(a \ E) = µ(a). (The first and last equality hold, because E S, the third, because F S.) Let {E i } be a sequence of sets in S, and set E := E i and E n := n E i. By induction on n, each E n S. Set F n := E n \ E n 1 = E n \ E n 1, n 2, and F 1 = E 1. For any n 2 and A, we have µ(a E n ) = µ(a E n E n 1 ) + µ(a E n \ E n 1 ) = µ(a E n 1 ) + µ(a F n ),

9 1.4. COMPLETE MEASURES 5 and, by induction, µ(a E n ) = n µ(a F i) for each n 1. This, together with σ-subadditivity, implies ( ) µ(a E) = µ A F i µ(a F i ) Using monotonicity, we find ( µ(a \ E) = µ A \ = lim n n µ(a F i ) = lim n µ(a E n). E i ) inf i 1 µ(a \ E i) = lim i µ(a \ E i ), since the sequence µ(a\e i ) is non-increasing and bounded from below by µ(a\e). Thus, µ(a E) + µ(a \ E) lim (µ(a E n) + µ(a \ E n ) = µ(a). n This shows that E S, since the converse inequality is trivially satisfied by subadditivity. So S is a σ-algebra. In order to see that (, S, µ S ) is a measure space, we need to show that µ is σ-additive on S. Let {E i } be a sequence of disjoint sets in S, and define E and E n as above. Then µ(e n ) = µ(e n E n ) + µ(e n \ E n ) = µ(e n ) + µ(e n 1 ), and, by induction, µ(e n ) = n µ(e i) for each n 1. Thus, n µ(e) µ(e n ) = µ(e i ) for all n, and hence µ(e) µ(e i), which implies µ(e) = µ(e i), as µ is σ-subadditive Complete measures Let (, S, µ) be a measure space. Sets E S with µ(e) = are called µ-null sets. If a statement about points x is true except for x in some null set, we say that it holds µ-almost everywhere, or µ-a.e. The measure µ is called complete if all subsets of null sets are measurable, i.e., E S, µ(e) =, and F E implies F S. Theorem 1.7 (Completion). Let (, S, µ) be a measure space. Define S := {E : A, B S, A E B, µ(b \ A) = }, and set µ(e) := µ(a) in this situation. Then S is a σ-algebra and µ is a measure on S. The measure space (, S, µ) is complete. The σ-algebra S is called the µ- completion of S. Proof. Let us check that S is a σ-algebra. Clearly, S S. If E S, then A E B and hence B c E c A c, and A c \ B c = A c B = B \ A has measure, that is E c S. Suppose that A i, B i S with A i E i B i and µ(b i \ A i ) = for all i. Then A i E i B i and ( ) ( ) ( ) ( ) B i \ A i = Bi \ A i Bi \ A i

10 6 1. BASIC MEASURE THEORY has measure zero. Hence E i S and S is a σ-algebra. Next we show that µ is well-defined on S. If A, B, A, B S satisfy A E B, µ(b \ A) =, A E B, µ(b \ A ) =, then A\A E \A B \A and hence µ(a\a ) =. Therefore µ(a) = µ(a A ). Similarly, we find µ(a ) = µ(a A ), and thus µ(a) = µ(a ). σ-additivity of µ on S follows from σ-additivity on S; if the sets E i above are disjoint then so are A i.

11 CHAPTER 2 Lebesgue measure on R n 2.1. Construction of the Lebesgue measure A box I in R n is given by the product of n compact intervals I = [a, b] := [a 1, b 1 ] [a 2, b 2 ] [a n, b n ], where a = (a 1,..., a n ), b = (b 1,..., b n ), and a i b i, i = 1,..., n, are real numbers. The volume I of I is defined by I = (b 1 a 1 ) (b n a n ). A box is called a cube if all its sides have the same length. A union of boxes is said to be almost disjoint if the interiors of the boxes are disjoint; the interior of a box I is denoted by I = (a, b) := (a1, b 1 ) (a 2, b 2 ) (a n, b n ). We denote by dist(e 1, E 2 ) := inf{ x 1 x 2 : x 1 E 1, x 2 E 2 } the distance of two subsets E 1, E 2 R n. Theorem 2.1 (Lebesgue measure). Let λ : P(R n ) [, ] be defined by and set Then: { } λ (E) := inf Q i : {Q i } is a countable cover of E by cubes, L(R n ) := {E P(R n ) : λ (A) = λ (A E) + λ (A \ E) for every A R n }. (1) λ is an outer measure; the so-called Lebesgue outer measure. (2) If dist(e 1, E 2 ) >, then λ (E 1 E 2 ) = λ (E 1 ) + λ (E 2 ). (3) L(R n ) is a σ-algebra that contains the Borel σ-algebra B(R n ). Proof. (1) Evidently, λ ( ) = and λ is monotone. In order to show that λ is σ-subadditive, let E = E i. We may assume that each λ (E i ) < for all i; otherwise there is nothing to prove. For given ɛ > and each j, there exists a cover E j k=1 Q j,k by cubes so that Q j,k λ (E j ) + ɛ 2 j. k=1 Then {Q j,k } j,k=1 is a cover of E by cubes, and hence λ (E) Q j,k λ (E j ) + ɛ j=1 k=1 which implies the assertion as ɛ was arbitrary. 7 j=1

12 8 2. LEBESGUE MEASURE ON R N (2) Choose dist(e 1, E 2 ) > δ > and fix ɛ >. There exists a cover {Q j } j=1 by cubes of E := E 1 E 2 so that Q j λ (E) + ɛ. j=1 We may assume that each Q j has diameter less than δ, after possibly subdividing Q j. Then each Q j can intersect at most one of E 1 or E 2, and setting J i := {j : Q j E i }, i = 1, 2, we have J 1 J 2 =, and E i j J i Q j, i = 1, 2. Thus, λ (E 1 ) + λ (E 2 ) Q j + Q j Q j λ (E) + ɛ, j J 1 j J 2 j=1 which implies (2), as ɛ was arbitrary; the converse inequality holds by (1). (3) That L(R n ) is a σ-algebra follows from Theorem 1.6. In order to show that B(R n ) L(R n ) it suffices to prove that L(R n ) contains all closed subsets of R n. Let F R n be closed, and let A be any subset of R n. By (1), it is enough to show that λ (A) λ (A F ) + λ (A \ F ), and so we may assume that λ (A) <. We set A := {x A : dist(x, F ) 1}, A i := {x A : (i + 1) 1 dist(x, F ) < i 1 }, i 1. Then any two sets A 2j and A 2k with even indices have positive distance; the same applies to sets A 2j+1 with odd indices. By (2), for each m N, m λ (A 2i ) = λ ( m ) A 2i λ (A), i= i= m λ (A 2i+1 ) = λ ( m ) A 2i+1 λ (A), i= and therefore i= λ (A i ) <. Using A \ F = i= A i and (1), we find λ (A F ) + λ (A \ F ) λ (A F ) + λ ( m ) A i + λ (A i ) = λ ( (A F ) λ (A) + i= i=m+1 i= m ) A i + i= λ (A i ), i=m+1 i=m+1 λ (A i ) (by (2)) which implies the required inequality, since i=m+1 λ (A i ) as m. Theorems 1.6 and 2.1 imply that the restriction of the Lebesgue outer measure λ to the σ-algebra L(R n ) is a measure. We call it the Lebesgue measure, and we denote it by λ or by λ n, when the dimension n is important. The elements of L(R n ) are called the (Lebesgue) measurable sets in R n. The Lebesgue measure is complete. Indeed, if E F and λ(f ) =, then λ (E) =, and hence λ (A) λ (A E) + λ (A \ E) λ (E) + λ (A) = λ (A), for any A R n. But a Lebesgue null set need not be a Borel set; see Example 3.5. In fact, we shall see in Corollary 2.1 that the Lebesgue measure is the completion of the Borel measure λ B(Rn ).

13 2.1. CONSTRUCTION OF THE LEBESGUE MEASURE 9 Example 2.2. One point sets are null sets. Indeed, for x R n, n λ ({x}) [x i 1 k, x i + 1 k ] = ( 2 k )n for all k 1. It follows that finite sets and countable sets are null sets. Example 2.3 (The Cantor set). Consider the interval C = [, 1] and let C 1 be the set obtained by deleting the middle third open interval from [, 1], i.e., C 1 = [, 1/3] [2/3, 1]. Next delete each middle third open interval of each subinterval in C 1, i.e., C 2 = [, 1/3 2 ] [2/3 2, 1/3] [2/3, 7/3 2 ] [8/3 2, 1]. Continuing this procedure we obtain a sequence C C 1 of compact sets. The intersection C := k= C k is called the Cantor set. The Cantor set is a null set. Each C k is a disjoint union of 2 k closed intervals, each of length 3 k. Since C C k for all k, λ(c) (2/3) k for all k, and thus λ(c) =. The Cantor set is uncountable. To see this observe that { a } j C = x [, 1] : x = 3 j, a j {, 2} and consider the function f : C [, 1] defined by a j x = 3 j f(x) = b j 2 j, where b j = a j 2. (2.1) j=1 j=1 j=1 The function f is clearly surjective and thus C is uncountable. Proposition 2.4. We have λ([a, b]) = [a, b] = (b 1 a 1 ) (b n a n ). In particular, degenerate boxes (where a i = b i for at least one i) are null sets. Proof. Clearly, λ([a, b]) [a, b]. Consider a grid in R n of cubes Q of side length 1/k. Let C 1 be the collection of all Q contained in [a, b], and let C 2 be the collection of all Q intersecting [a, b] as well as [a, b] c. Then the number of cubes in C 2 is bounded by k n 1 times a constant C independent of k, and thus Q C 2 Q C/k. Then, as Q C 1 Q [a, b], Q [a, b] + C/k, Q C 1 C 2 for all k, and therefore λ([a, b]) [a, b]. Lemma 2.5. If E = Q i is an almost disjoint union of cubes, then λ(e) = Q i. Proof. Let ɛ >. For each Q i choose a cube Q i contained in the interior of Q i and such that Q i Q i + ɛ/2 i. Then the cubes Q i are disjoint, and hence ( ) Q i λ(e) λ Q i = Q i Q i ɛ. The statement follows, as ɛ was arbitrary. Lemma 2.6. Every open set U R n is a countable almost disjoint union of cubes. Proof. Consider the collection C of cubes of side length 1 defined by the lattice Z n. Set U := {Q C : Q U} and V := {Q C : Q U and Q U c }.

14 1 2. LEBESGUE MEASURE ON R N Let C 1 be the collection of cubes that we obtain by subdividing each cube in V into 2 n cubes of side length 1/2, and set U 1 := {Q C 1 : Q U} and V 1 := {Q C 1 : Q U and Q U c }. Continue this procedure. Then U = Q U Q, where U := i= U i, is a countable almost disjoint union of cubes Radon measures on R n Let be a topological space. A measure µ on a σ-algebra S B() is called outer regular if and inner regular if µ(e) = inf{µ(u) : E U, U open}, E S, µ(e) = sup{µ(k) : K E, K compact}, E S. If µ is both outer and inner regular, it is called regular. A Radon measure on R n is a Borel measure that is finite on compact sets. More generally, a Radon measure on a locally compact Hausdorff space is a Borel measure that is finite on compact sets, outer regular on Borel sets, and inner regular on open sets. The next theorem shows that on R n finiteness on compact sets implies regularity. By the Riesz representation theorem (e.g. [5]), the Radon measures on a locally compact Hausdorff space correspond to the positive linear functionals on the space C c () of continuous functions with compact support. We denote by B r (x) := {y R n : x y < r} the open ball centered at x R n of radius r with respect to the Euclidean norm x := (x x 2 n) 1/2. Theorem 2.7. Each Radon measure µ on R n is σ-finite and regular. For each Borel set A and each ɛ > there is an open set U and a closed set F so that F A U, and µ(u \ F ) ɛ. (2.2) Proof. Evidently, µ is σ-finite. Let us prove (2.2). First we assume that µ is finite. Let A be the set of all Borel sets A that satisfy (2.2). We claim that A is a σ-algebra. If A A, then for given ɛ > there are U and F satisfying (2.2), and thus U c A c F c and µ(f c \ U c ) = µ(u \ F ) ɛ, i.e., A c A. Suppose that A i A, i 1, and ɛ >. So there are open U i and closed F i so that F i A i U i and µ(u i \ F i ) ɛ/2 i+1. Then U := U i is open and F := m F i is closed for finite m. Since µ is finite, ( ) ( ) ( ) ( m ) µ F i \ F µ F i \ F = µ F i µ F i ɛ/2, i=m+1 for sufficiently large m, by Lemma 1.1. Since U \F (U \ F i) ( i=m+1 F i\f ), ( ) µ(u \ F ) µ(u i \ F i ) + µ F i \ F ɛ. i=m+1 Thus A is a σ-algebra. Every closed set F R n belongs to A, since the sets U k := {x : dist(x, F ) < 1/k} are open and satisfy µ(u k \ F ) as k, by Lemma 1.1. It follows that A = B(R n ) and hence (2.2). Assume that µ is not finite. Let A be a Borel set and let ɛ > be given. Since ν i (E) := µ(e B i ()) is a finite Radon measure on R n, by the above, there exists

15 2.3. PROPERTIES OF THE LEBESGUE MEASURE 11 a closed set C i (B i () \ A) with ν i ((B i () \ A) \ C i ) = µ((b i () \ A) \ C i ) ɛ/2 i. Then U := (B i() \ C i ) is open, A = B i () A (B i () \ C i ) = U and µ(u \ A) µ((b i () \ C i ) \ A) ɛ. Similarly, there exists a closed set F i A i := A {x : i x < i + 1} with µ(a i \ F i ) ɛ/2 i+1, F := F i A i = A, and i= µ(a \ F ) i= µ(a i \ F i ) ɛ. i= It remains to show that F is closed. If x F and F x k x, then x k x and so x k F j F j+1 for some j and for all sufficiently large k. Consequently, x F j F j+1 F, since F j F j+1 is closed. Thus (2.2) is proved. Finally, we show that µ is regular. Let A be a Borel set, and let ɛ >. Outer regularity is clear if µ(a) = and follows from (2.2) if µ(a) < : there exists an open set U A so that µ(a) + ɛ µ(a) + µ(u \ A) = µ(u). Next we show µ(a) = sup{µ(f ) : F A, F closed}. (2.3) It follows from (2.2) if µ(a) < : there is a closed set F A so that µ(a) ɛ µ(a) µ(a \ F ) = µ(f ). If µ(a) =, write A = i= A i where A i is as above. Since µ is finite on compact sets, µ(a i ) <, and, again by (2.2), there exist closed F i A i with µ(f i ) µ(a i ) 1/2 i+1. By Lemma 1.1, ( lim µ k ) ( ) F i = µ F i = µ(f i ) µ(a) 1 =, k i= i= i= which shows (2.3), since k i= F i is closed. We finally have sup{µ(k) : K A, K compact} = sup{µ(f ) : F A, F closed}, since for any closed F R n the sets K k := F B k () are compact and µ(f ) = lim k µ(k k ) Properties of the Lebesgue measure Proposition 2.8. The Lebesgue outer measure is Borel regular, i.e., for each E R n there exists a Borel set B E such that λ (E) = λ (B). Proof. If λ (E) = take B = R n. Suppose that λ (E) <. For each k 1 choose a countable collection C k of cubes so that E Q =: B k and Q λ (E) + 1/k. Q C k Q C k Then B := k=1 B k is a Borel set that contains E and satisfies λ (B) λ (B k ) Q C k Q λ (E) + 1/k, for all k, hence λ (E) = λ (B).

16 12 2. LEBESGUE MEASURE ON R N Theorem 2.9 (Regularity). The Lebesgue measure λ on R n is σ-finite and regular. Its restriction to B(R n ) is a Radon measure. Proof. Clearly, λ is finite on compact sets and hence a Radon measure when restricted to B(R n ). Thus λ is σ-finite. By Theorem 2.7, λ(b) = inf{λ(u) : B U, U open} = sup{λ(k) : K B, K compact} for each Borel set B. If E R n is arbitrary, then, by Proposition 2.8, there is a Borel set B E with λ (E) = λ (B), and thus λ (E) = λ (B) = inf{λ(u) : B U, U open} inf{λ(u) : E U, U open}, which shows that λ is outer regular. To see that λ is inner regular let E R n be measurable, and suppose first that E is contained in a cube Q. Let ɛ >. Then λ(q \ E) < and, as λ is outer regular, there exists an open U Q \ E so that λ(u) λ(q \ E) + ɛ. The set K := Q \ U E is compact and satisfies λ(e) = λ(q) λ(q \ E) λ(q) λ(u) + ɛ λ(q) λ(q U) + ɛ = λ(k) + ɛ. If E is not contained in a cube, for each k 1, there is a compact K k E [ k, k] n so that λ(k k ) λ(e [ k, k] n ) 1/k. Hence λ(k k ) λ(e) as k and hence λ is inner regular. Corollary 2.1 (Characterization of Lebesgue measurability). A set E R n is Lebesgue measurable if and only if there are an F σ -set A and a G δ -set B satisfying A E B and λ(b \ A) =. An F σ -set is a countable union of closed sets, and a G δ -set is a countable intersection of open sets. The corollary implies, in view of Theorem 1.7, that the Lebesgue σ-algebra L(R n ) is the completion of the Borel σ-algebra B(R n ). Proof. Assume that E is Lebesgue measurable. Theorem 2.9 implies that there exist open sets G i and closed sets F i satisfying F i E G i and λ(g i \ F i ) 1/i. The sets F = F i and G = G i are as required. Conversely, if there exist such F and G, then for any A R n, we have A F A E A G, A \ G A \ E A \ F, λ ((A G) \ (A F )) = λ (A (G \ F )) λ (G \ F ) =, and similarly λ ((A \ F ) \ (A \ G)) =. This implies λ (A E) = λ (A F ) and λ (A \ E) = λ (A \ F ), and thus since F is measurable. λ (A E) + λ (A \ E) = λ (A F ) + λ (A \ F ) = λ (A), Theorem 2.11 (Uniqueness of Lebesgue measure I). The Lebesgue measure λ is the unique measure on the Borel σ-algebra B(R n ) satisfying λ([a, b]) = [a, b]. Proof. By Proposition 2.4, λ([a, b]) = [a, b]. Suppose there is a second measure µ on B(R n ) with this property. We claim that λ and µ coincide on the collection A of all finite disjoint unions of sets of the form F G, where F is closed and G is open, and that A is an algebra. The statement of the Theorem is then a consequence of Theorem 1.4, since the σ-algebra generated by A is the Borel σ-algebra. That A is an algebra follows from Proposition 1.5, since the collection of sets of the form F G, where F is closed and G is open, is an elementary family, in fact, (F 1 G 1 ) (F 2 G 2 ) = (F 1 F 2 ) (G 1 G 2 ),

17 2.3. PROPERTIES OF THE LEBESGUE MEASURE 13 (F G) c = (F G c ) (F c G) (F c G c ). If F is closed and G is open, set G k := {x R n : dist(x, F ) < 1/k}. Then G k is open, G k G k+1, and F = k=1 G k. If µ(g) < then, by Lemma 1.1, ( ) ( µ (G k G) = lim µ(g j G) = lim λ(g ) j G) = λ (G k G), j j k=1 since λ and µ coincide on open sets, by Lemmas 2.5 and 2.6. Thus µ(f G) = λ(f G) if µ(g) <. If µ(g) =, then µ(f G ( k, k) n ) = λ(f G ( k, k) n ) and letting k we find again µ(f G) = λ(f G). By σ-additivity, µ and λ coincide on A. Corollary A Borel regular outer measure µ on R n so that all Borel sets are µ-measurable and so that µ([a, b]) = [a, b] coincides with the Lebesgue outer measure. Proof. By Theorem 2.11, µ and λ coincide on all Borel sets. Let E R n be arbitrary. As µ and λ are Borel regular, there exist Borel sets B 1, B 2 E so that µ(b 1 ) = µ(e) and λ (B 2 ) = λ (E). Then, as B 1 B 2 E, we have µ(e) = µ(b 1 ) µ(b 1 B 2 ) µ(e), thus µ(e) = µ(b 1 B 2 ), and analogously λ (E) = λ (B 1 B 2 ). Therefore µ(e) = λ (E). Proposition 2.13 (Translation invariance). The Lebesgue measure λ on R n is translation invariant, i.e., if E is measurable and y R n, then the set E + y := {x + y : x E} is measurable and λ(e + y) = λ(e). Proof. The assertion is clearly true in the case that E is a cube. Consequently, for arbitrary E R n we have λ (E + y) = λ (E). If E is measurable and A R n is arbitrary, then λ (A (E + y)) + λ (A \ (E + y)) k=1 = λ (((A y) E) + y) + λ (((A y) \ E) + y) = λ ((A y) E) + λ ((A y) \ E) = λ (A y) = λ (A), and so E + y is measurable. For further invariance properties, see Lemma 3.32 and Theorem Theorem 2.14 (Uniqueness of Lebesgue measure II). If µ is a translation invariant Radon measure on R n, then there is a constant C > such that µ(e) = Cλ(E) for all Borel sets E. Proof. Set µ([, 1) n ) =: C <. Consider the grid of dyadic cubes of the form [a 1, b 1 ) [a n, b n ) defined by the lattice 2 k Z n. Since these cubes are all translates of each other, 2 kn µ(q) = µ([, 1) n ) = Cλ([, 1) n ) = C2 kn λ(q), for each such cube Q. We may infer that µ vanishes on degenerate boxes, and so µ(q) = Cλ(Q) for each closed dyadic cube Q = [a 1, b 1 ] [a n, b n ]. Then µ(e) = Cλ(E) for each open set E, by Lemmas 2.5 and 2.6, and thus for each Borel set E, by regularity of µ and λ, see Theorems 2.7 and 2.9.

18 14 2. LEBESGUE MEASURE ON R N Proposition 2.15 (Approximation by cubes). Let E R n be measurable with λ(e) <. For each ɛ > there exist cubes Q 1,..., Q m such that λ(e m Q i) < ɛ, where E F := (E\F ) (F \E) = (E F )\(E F ) is the symmetric difference. Proof. Let ɛ > be fixed and let Q i be cubes such that E Q i and Q i λ(e) + ɛ/2. Since λ(e) < the infinite sum converges and there exists m such that i=m+1 Q i < ɛ/2. Then, ( m ) ( m ) ( m ) λ E Q i = λ E \ Q i + λ Q i \ E ( λ i=m+1 i=m+1 ) ( m ) Q i + λ Q i \ E Q i + Q i λ(e) < ɛ Non-measurable sets Every set of positive measure in R has non-measurable subsets. Theorem 2.16 (Existence of non-measurable sets). Let E R. If every subset of E is Lebesgue measurable, then λ(e) =. Proof. On R consider the equivalence relation x y : x y Q. The axiom of choice allows us to choose exactly one element in each equivalence class and to gather these elements in one set N; such a set is called a Vitali set. For q Q consider the translates N +q which are pairwise disjoint; otherwise we have x+q 1 = y +q 2 and thus x y Q, but x and y belong to different equivalence classes, a contradiction. Fix p Q and set E p := E (N + p). By assumption, E p is measurable. Let K E p be compact and set L := q Q [,1] K + q. Then λ(l) <, since L is bounded, and, since the sets K + q are disjoint, λ(l) = q Q [,1] λ(k). Thus λ(k) =. Since K was arbitrary, we may conclude that λ(e p ) =, by regularity of λ. Consequently, λ(e) =, because E = p Q E p. In the previous proof the axiom of choice plays an essential role. In fact, Solovay constructed a model in which all axioms of Zermelo Frankel set theory, except the axiom of choice, hold and in which every subset of R is Lebesgue measurable. There exists a finitely additive translation-invariant set function assigning boxes their volume that is defined on all subsets of R, respectively R 2, but not in higher dimensions. In fact, any ball in R 3 can be decomposed into finitely many disjoint subsets, which can then be reassembled using only rotations and translations to form two copies of the original ball; this results is called the Banach Tarski paradox.

19 CHAPTER 3 Integration 3.1. Measurable functions A set equipped with a σ-algebra S P() is called a measurable space (, S). A mapping f : Y between measurable spaces (, S) and (Y, T) is called (S, T)-measurable if f 1 (E) S for every E T. It is obvious by definition that the composition of measurable mappings is measurable, more precisely, if f : Y is (S, T)-measurable and g : Y Z is (T, U)-measurable then g f is (S, U)-measurable. Lemma 3.1. If T is generated by A, then a mapping f : Y is (S, T)- measurable if and only if f 1 (E) S for every E A. Proof. This follows from the fact that {E Y : f 1 (E) S} is a σ-algebra on Y containing A, and hence containing T. If follows that any continuous mapping f : Y between topological spaces and Y is (B(), B(Y ))-measurable. If f is a real or complex valued function on a measurable space (, S) then we say that f is S-measurable if f is (S, B(R))- or (S, B(C))-measurable. For instance, f : R n C is Lebesgue measurable if it is (L(R n ), B(C))-measurable, and it is Borel measurable or also a Borel function if it is (B(R n ), B(C))- measurable. Note that if f, g : R R are Lebesgue measurable, then g f need not be Lebesgue measurable. The characteristic function χ A : R of a subset A, { 1 if x A χ A (x) := if x A, is S-measurable if and only if A is S-measurable. Proposition 3.2. Let be a measurable space. (1) If f 1, f 2 : R are measurable, then f = (f 1, f 2 ) : R 2 is measurable. (2) A complex valued function f : C is measurable if and only if Re f and Im f are measurable. In this case f is measurable. (3) If f, g : C are measurable, then so are f + g and fg. Proof. (1) Every open subset U R 2 is a countable union of cubes U = Q i, by Lemma 2.6. Then f 1 (U) = f 1 ( Q i) = f 1 (Q i ) is measurable, since each f 1 (Q i ) = f1 1 (I i,1) f2 1 (I i,2) is measurable, where Q i = I i,1 I i,2 and I i,1, I i,2 are compact intervals. (2) follows from (1) and the fact if f : C is measurable then the composite g f for any continuous mapping g is measurable. This also implies (3). 15

20 16 3. INTEGRATION The extended real line is the set [, ] = R {± } with the topology generated by the open sets of R and all intervals [, a) and (a, ]. Then B([, ]) = {E [, ] : E R B(R)}. A function f : [, ] on a measurable space (, S) is said to be S-measurable if it is (S, B([, ]))- measurable. Proposition 3.3. Let (, S) be a measurable space. A function f : [, ] is S-measurable if and only if f 1 ((a, ]) S for all a R. Proof. By Lemma 3.1, it suffices to show that {(a, ] : a R} generates B([, ]). This follows from [, a) = [, a 1 i ] = (a 1 i, ]c and from (a, b) = [, b) (a, ]. It follows that every upper or lower semicontinuous function is Borel measurable. Recall that a function f : [, ] on a topological space is upper (or lower) semicontinuous if {x : f(x) < a} (or {x : f(x) > a}) is open for all a R. Theorem 3.4 (Pointwise limits of measurable functions). Let f n : [, ], n N, be a sequence of measurable functions on a measurable space (, S). Then inf f n, n N sup f n, n N lim inf n f n, lim sup f n n are measurable. Thus, the limit of any pointwise convergent sequence of complex valued measurable functions is measurable. Proof. Let g := sup n N f n. Then g 1 ((a, ]) = n N f n 1 ((a, ]) and thus g is measurable, by Proposition 3.3. The result for the infimum is analogous (note that inf n f n = sup n ( f n )). Since the result follows. lim sup f n = inf sup f m and lim inf f n = sup n n N m n n inf f m n N m n Thus, if f, g : [, ] are measurable, then so are the functions min{f, g} and max{f, g}. In particular, this is true for f + := max{f, } and f := min{f, }, the positive and negative part of f. Note that f = f + f and f = f + + f. For a complex valued function f : C we have its polar decomposition, { z/ z z f = f sgn f, where sgn z := z =. If f is measurable, then so is f and sgn f. Indeed, : C R is continuous, and the preimage sgn 1 (U) of an open set U C is either open or of the form V {}, where V is open, and hence sgn is Borel. Example 3.5 (The Cantor function). Consider R with the Lebesgue measure λ. Let C be the Cantor set from Example 2.3. The Cantor set is a closed null set, in particular, C is Borel. Let f : C [, 1] be the function defined in (2.1). It is easy to see that x, y C implies f(x) < f(y) unless x and y are the endpoints of one of the intervals removed from [, 1] to obtain C. In the latter case f(x) = k/2 l for some integers k, l, and f(x) and f(y) are the two expansions in base 2 of this number. Thus, we can extend f to a function f : [, 1] [, 1] by setting

21 3.2. APPROIMATION BY SIMPLE FUNCTIONS 17 f (a,b) f(a) = f(b) on each connected component (a, b) of [, 1] \ C. Then f is still nondecreasing and it is continuous, since its range is all of [, 1]. This is called the Cantor function Figure 1. The Cantor function. (Generated with Mathematica and based on the code provided in [15, p.173].) As a by-product we obtain the existence of Lebesgue null sets which are not Borel as follows. The function g(x) = x+f(x) is strictly increasing and continuous, thus a homeomorphism onto its image. The image g(c) has positive measure and so, by Theorem 2.16, there is a non-measurable subset F g(c). If we set E = g 1 (F ), then E C and hence E is a null set. But E is not Borel. Indeed, if E were Borel, then so were F, since g 1 is continuous Approximation by simple functions Let (, S) be a measurable space. A simple function is a complex valued measurable function on with finite image. A simple function is representable in the form N s = a i χ Ei, where all E i S and a i C. In fact, setting E i := {x : s(x) = a i }, where s() = {a 1,..., a N }, yields such a representation with the additional property that all a i are distinct and all E i are disjoint; we call this particular representation canonical. Simple functions will be for the Lebesgue integral what step functions (where E i are just boxes in = R n ) are for the Riemann integral. Theorem 3.6 (Approximation by simple functions). Let f : [, ] be measurable. There exist simple functions s i on such that (1) s 1 s 2 f (2) lim i s i (x) = f(x) for every x. Proof. To each integer m 1 and each t > there corresponds a unique integer k = k(m, t) that satisfies k/2 m t < (k + 1)/2 m. Define { k(m, t)/2 m if t < m g m (t) := m if m t. We have t 2 m < g m (t) t if t < m.

22 18 3. INTEGRATION Thus lim m g m (t) = t for every t [, ], and clearly g 1 g 2 t. Then s m := g m f are simple functions with the required properties. Corollary 3.7. Let f : [, ] or f : C be measurable. There exist simple functions s i on such that (1) s 1 s 2 f (2) lim i s i (x) = f(x) for every x. Proof. Consider first the case f : [, ]. By Theorem 3.6 applied to f + and f, there are simple functions s + 1 s+ 2 f + and s 1 s 2 f so that lim i s ± i (x) = f ± (x) for every x. Then s i := s + i s i is as required. The case f : C is an easy consequence. Given a measure µ on (, S), one often wants to ignore µ-null sets. In this respect we have for complete measures: Proposition 3.8. Assume that µ is complete, and that f, g, f i are functions with values in [, ] or in C. (1) If f is measurable and f = g µ-a.e., then g is measurable. (2) If f i are measurable and f i f µ-a.e., then f is measurable. Proof. We may assume that all functions have values in the extended real line. (1) Since µ is complete, the sets E = {x : f(x) g(x)} and g 1 ((a, ]) E are measurable, and thus g 1 ((a, ]) = (f 1 ((a, ]) E c ) (g 1 ((a, ]) E) is measurable. (2) Let E = {x : f i (x) f(x)}. Then f i χ E fχ E and µ(e c ) =. By Theorem 3.4, fχ E is measurable, and so f 1 ((a, ]) = (f 1 ((a, ]) E c ) ((fχ E ) 1 ((a, ]) E) is measurable. If the measure is not complete we still have: Proposition 3.9. Let (, S, µ) be a measure space and (, S, µ) its completion. If f is a S-measurable function on, then there is a S-measurable function g such that f = g µ-a.e. Proof. This is immediate from the definition of the completion µ, if f = χ E with E S and hence if f is a S-measurable simple function. By Corollary 3.7, there is a sequence of S-measurable simple functions s i converging pointwise to f. For each i, there is a S-measurable function g i so that s i = g i except on a set E i S with µ(e i ) =. Choose a set F S with µ(f ) = and F E i; it exists by the definition of S. Then g = lim i g i χ F c is as required, by Theorem Integration on a measure space Let us fix the arithmetic in [, ]. We define a + = + a = if a [, ] { if a (, ] a = a = if a =. Then addition and multiplication in [, ] are commutative, associative, and distributive. The cancellation laws have to be treated with some care; a + c = b + c implies a = b only if c [, ), and ac = bc implies a = b only if c (, ).

23 3.3. INTEGRATION ON A MEASURE SPACE 19 Lemma 3.1. If f, g : [, ] are measurable, then so are f + g and fg. Proof. By Theorem 3.6 there exist simple functions s 1 s 2 f and t 1 t 2 g such that s i (x) f(x) and t i (x) g(x) for all x. Then s i (x) + t i (x) f(x) + g(x) and s i (x)t i (x) f(x)g(x), and Theorem 3.4 implies the statement. Throughout this section let (, S, µ) be a fixed measure space. We will define the integral in three steps: for positive simple functions, for positive functions, for complex valued functions. Step 1. Integrating positive simple functions. The (Lebesgue) integral s dµ with respect to the measure µ of a simple function s : [, ) with canonical representation s = N a iχ Ei is defined by N s dµ := a i µ(e i ), where we use the convention =. If E S, then sχ E is a simple function, and we define N s dµ := sχ E dµ = a i µ(e i E). E Lemma Let s : [, ) be a simple function and let s = N a iχ Ei be any representation as a linear combination of characteristic functions. Then N s dµ = a i µ(e i ). Proof. There exists a refinement {F 1,..., F M } of N E i such that N M F j S are disjoint, E i = F j, and E i = F j. j=1 F j E i It suffices to take {F 1,..., F M } = {G 1 G N : G i {E i, (E i ) c }} \ {(E 1 ) c (E N ) c }. If we set b j := F j E i a i then s = M j=1 b jχ Fj. The numbers b j may not be distinct and some may be zero. If b {b j } is non-zero, set H b := b F j=b j. Clearly, the sets H b are pairwise disjoint and satisfy µ(h b ) = b µ(f j=b j). We have s = bχ Hb where the sum is over the non-zero values in {b j }, and then s dµ = M M N bµ(h b ) = b j µ(f j ) = a i µ(f j ) = a i µ(e i ). j=1 j=1 F j E i Lemma Let s and t be positive simple functions on, and E, F, E i S. (1) For a [, ) we have as dµ = a s dµ. (2) (s + t) dµ = s dµ + t dµ. (3) If s t, then s dµ t dµ. (4) If E F, then s dµ s dµ. E F (5) The mapping E s dµ is a measure on S. E (6) If µ(e) = then s dµ =. E

24 2 3. INTEGRATION Proof. (1) is obvious. Let s = N a iχ Ei and t = M j=1 b jχ Fj be canonical representations. Then E i = M j=1 (E i F j ) and F j = N (E i F j ) and these unions are disjoint. Thus, by finite additivity of µ, N M s dµ + t dµ = (a i + b j )µ(e i F j ) = (s + t) dµ, j=1 which shows (2). If s t, then a i b j whenever E i F j, and hence N M N M s dµ = a i µ(e i F j ) b j µ(e i F j ) = t dµ, j=1 j=1 that is (3). (4) follows from (3), or from monotonicity of µ. For (5), if F 1, F 2,... S are disjoint, then N N s dµ = a i µ(e i F j ) = a i µ(e i F j ) = s dµ. j=1 Fj F j j=1 j=1 (6) follows from the definition. Step 2. Integrating positive functions. The (Lebesgue) integral f dµ with respect to the measure µ of a positive measurable function f : [, ] is j=1 defined by { f dµ := sup } s dµ : s simple and s f [, ]. If E S, we define { } f dµ := fχ E dµ = sup s dµ : s simple and s f. E E For simple f this definition coincides with the earlier one, by Lemma 3.12, (3). Lemma For measurable functions f, g : [, ] we have af dµ = a f dµ, for a [, ), and f dµ g dµ, if f g. Proof. This is clear from the definition. Note that this implies f dµ f dµ if E F. F E Theorem 3.14 (Monotone convergence theorem or Beppo Levi s theorem). Let f i be measurable functions on satisfying (1) f 1 f 2 (2) lim i f i (x) = f(x) for all x. Then f is measurable, and lim i f i dµ = f dµ. Proof. By Theorem 3.4, f is measurable. Since f i f i+1 f for all i, we have f i dµ f i+1 dµ f dµ, by Lemma 3.13, and hence lim i fi dµ exists (possibly equal to ) and satisfies lim i f i dµ f dµ.

25 3.3. INTEGRATION ON A MEASURE SPACE 21 Let s be a simple function satisfying s f, and let a (, 1). Set E i := {x : f i (x) as(x)}. Then E i S, E 1 E 2, = E i, and, by Lemma 3.13, f i dµ f i dµ a s dµ. E i E i Since E E s dµ is a measure, by Lemma 3.12, lim i E i s dµ = s dµ, by Lemma 1.1, and so lim f i dµ a s dµ, i and, as this holds for every a < 1, it remains true for a = 1. Taking the supremum over all simple functions s satisfying s f, we get The proof is complete. lim i f i dµ f dµ. Corollary Let f i : [, ] be measurable functions, and f = f i. Then f dµ = f i dµ. Proof. First we prove the statement for the sum of two functions f and g. By Theorem 3.6, there exist simple functions s 1 s 2 f and t 1 t 2 g with s i (x) f(x) and t i (x) g(x) for all x. Then s i + t i is an increasing sequence of simple functions that converges pointwise to f + g, and Theorem 3.14 together with Lemma 3.12 imply (f + g) dµ = lim (s i + t i ) dµ = lim i i s i dµ + lim i t i dµ = f dµ + g dµ. By induction, we obtain n f dµ = n fi dµ for finite n, and applying Theorem 3.14 to F n := n f i, implies the result for infinite sums. Corollary Let f : [, ] be measurable. Then ν(e) = f dµ is a E measure on S. If g : [, ] is measurable, then g dν = gf dµ. Proof. Let E i S be pairwise disjoint. By Corollary 3.15, ( ) ν E i = χ Ei f dµ = χ Ei f dµ = ν(e i ), so ν is a measure on S. By definition, g dν = gf dµ holds for g = χ E, E S, and hence for each positive simple function, N N N N a i χ Ei dν = a i χ Ei dν = a i χ Ei f dµ = a i χ Ei f dµ. The general case follows from Theorem 3.6 and the monotone convergence theorem Corollary 3.17 (Fatou s lemma). For measurable functions f i : [, ], lim inf f i dµ lim inf f i dµ. i i

26 22 3. INTEGRATION Proof. Set g j := inf i j f i. Then g j g j+1 and g j f i for all i j. Thus, gj dµ inf i j fi dµ. Since lim j g j = lim inf i f i, the monotone convergence theorem 3.14 implies that lim inf f i dµ = lim i j g j dµ lim inf i f i dµ. Proposition For a measurable function f : [, ], f dµ = if and only if f = µ-a.e. Proof. This is clearly true if f is a simple function; if f = N a iχ Ei is the canonical representation then a i, and f dµ = if and only if for each i either a i = or µ(e i ) =. In general, if f = µ-a.e. and s is a simple function with s f, then s = µ-a.e. and thus f dµ = sup s f s dµ =. Conversely, if f µ-a.e., then there is an integer k 1 so that µ({x : f(x) > 1/k}) >, since {x : f(x) > } = k=1 {x : f(x) > 1/k}. But then f > k 1 χ {x:f(x)>1/k} and therefore f dµ k 1 µ({x : f(x) > 1/k}) >. Corollary Let f i, f : [, ] be measurable functions so that f i (x) f(x) for µ-a.e. x, then lim i fi dµ = f dµ. Proof. There is a measurable set E with µ(e c ) = and such that f i (x) f(x) for each x E. Then f fχ E = a.e. and f i f i χ E = a.e. and by the monotone convergence theorem 3.14 and Proposition 3.18, lim i f i dµ = lim i f i χ E dµ = fχ E dµ = f dµ. Step 3. Integrating complex valued functions. We define { } L 1 (µ) := f : C measurable : f dµ <. If f is measurable, then so is f, by Proposition 3.2, any hence the integral is defined. The members of L 1 (µ) are called (Lebesgue) integrable functions with respect to the measure µ. For f L 1 (µ), f = u + iv, and E S, we define the (Lebesgue) integral over E with respect to the measure µ by ( ) ( ) f dµ := u + dµ u dµ + i v + dµ v dµ. E E E E E The measurability of f guarantees the measurability of u ±, v ±, which are all positive functions. So all integrals on the right-hand side exist. As u ± u f and v ± v f all four integrals are finite, and thus f dµ C. E If f : [, ] is measurable, we define f dµ := f + dµ f dµ E E provided that at least one integral on the right-hand side is finite; then f dµ E [, ]. Proposition 3.2. Let f, g L 1 (µ). Then (1) Linearity. If a, b C, then af + bg L 1 (µ) and (af + bg) dµ = a f dµ + b g dµ. E

27 3.3. INTEGRATION ON A MEASURE SPACE 23 (2) Monotony. If f g, then f dµ g dµ. (3) Triangle inequality. f dµ f dµ. (4) σ-additivity. If E i S are disjoint, then f dµ = f dµ. Ei E i Proof. (1) By Proposition 3.2, af + bg is measurable, and, by the properties of the integral for positive functions, af + bg dµ a f + b g dµ = a f dµ + b g dµ <. Hence af + bg L 1 (µ). Next we show f + g dµ = f dµ + g dµ. (3.1) To this end we may assume without loss of generality that f and g are real valued. Setting h = f + g we have or equivalently h + h = f + f + g + g h + + f + g = f + + g + + h and thus h + dµ + f dµ + g dµ = f + dµ + g + dµ + h dµ. Each of these integrals is finite, so (3.1) follows. Let us show af dµ = a f dµ. (3.2) If a this follows easily from Lemma For a = 1 we have, writing f = u+iv, ( ) ( ) f dµ = ( u) + dµ ( u) dµ + i ( v) + dµ ( v) dµ ( ) ( ) = u dµ u + dµ + i v dµ v + dµ = f dµ, for a = i, if dµ = (iu v) dµ = i ( = i u dµ v dµ u dµ + i ) v dµ = i f dµ. Combining these cases with (3.1) implies (3.2), and (1) follows. (2) By assumption f + f g + g, or equivalently f + + g g + + f, thus (f + + g ) dµ (g + + f ) dµ, and (1) implies the assertion.

28 24 3. INTEGRATION (3) Since f dµ C, there exists a C, a = 1, so that a f dµ = f dµ. Then f dµ = a f dµ = af dµ = Re(af) dµ af dµ = f dµ. (4) follows from the definition and from Corollary Proposition Let f, g L 1 (µ). Then f dµ = g dµ for all E S if and E E only if f = g µ-a.e. Proof. By Proposition 3.18, f = g µ-a.e. if and only if f g dµ =. If f g dµ =, then for any E S, f dµ g dµ f g dµ f g dµ =, E E E whence f dµ = g dµ. Conversely, if u = Re(f g) and v = Im(f g) E E and f g µ-a.e., then at least one of u +, u, v +, v must be nonzero on a set of positive measure. If E = {x : u + (x) > } has positive measure, then Re( E f dµ E g dµ) = E u+ dµ >, since u = on E. The other cases work analogously. This proposition implies that regarding integration it makes no difference if we modify functions on null sets. Theorem 3.22 (Dominated convergence theorem). Let f i : C be measurable functions such that f i f µ-a.e. If there is a function g L 1 (µ) such that f i g µ-a.e. for all i, then f L 1 (µ) and lim f i f dµ = and f dµ = lim f i dµ. i i Proof. The function f is measurable (maybe after redefinition on a null set), by Theorem 3.4. Since f g µ-a.e., f L 1 (µ). Since f i f 2g µ-a.e., hence 2g f i f µ-a.e., Fatou s lemma 3.17 implies 2g dµ lim inf (2g f i f ) dµ = = i 2g dµ + lim inf i 2g dµ lim sup i ( f i f dµ. ) f i f dµ As 2g dµ is finite, we may conclude lim sup i fi f dµ and thus lim i fi f dµ =. Finally, f dµ lim f i dµ = lim (f f i ) dµ lim f f i dµ = i i shows that f dµ = lim i fi dµ. i Corollary If f i is a sequence in L 1 (µ) such that fi dµ <, then f i converges µ-a.e. to a function in L 1 (µ), and f i dµ = fi dµ. Proof. Corollary 3.15 implies f i dµ = fi dµ <, and so g := f i L 1 (µ). Then f i(x) is finite for µ-a.e. x, and for these x the series f i(x) converges. The dominated convergence theorem 3.22 applied to the partial sums gives f i dµ = fi dµ.

29 3.4. FUBINI S THEOREM Fubini s theorem Let (, S) and (Y, T) be two measurable spaces. On the cartesian product Y we consider the σ-algebra S T generated by all measurable rectangles, that is by the set E := {E F : E S, F T}. Since (A B) (E F ) = (A E) (B F ) and (A B) c = ( B c ) (A c B), E is an elementary family. For a set E Y we denote by E x = {y : (x, y) E} and E y = {x : (x, y) E} its respective sections. Lemma If E S T then E x T and E y S for each x and y Y. We say that every set in S T has the section property. Proof. We set R := {E S T : E x T for all x } and show that R is a σ-algebra containing all measurable rectangles. This implies the statement; the proof for E y is analogous. If E = A B is a measurable rectangle, then E x = B if x A and E x = if x A c, so E R. That R is a σ-algebra follows from the identities (E c ) x = (E x ) c and ( E i) x = (E i) x. With a function f on Y we associate functions f x on Y given by f x (y) := f(x, y) and functions f y on given by f y (x) := f(x, y). Lemma Let f be a S T-measurable function on Y. Then f x is T- measurable for all x, and f y is S-measurable for all y Y. Proof. This follows from Lemma 3.24, since (f x ) 1 (E) = (f 1 (E)) x and (f y ) 1 (E) = (f 1 (E)) y. Theorem 3.26 (Product measure). Let (, S, µ) and (Y, T, ν) be σ-finite measure spaces. If E S T, then the functions x ν(e x ) and y µ(e y ) are measurable on and Y, respectively, and (µ ν)(e) := ν(e x ) dµ(x) = µ(e y ) dν(y) (3.3) is a σ-finite measure on S T. It is called the product of the measures µ and ν. Proof. First assume that µ and ν are finite. Let R be the collection of all E S T for which x ν(e x ) and y µ(e y ) are measurable and (3.3) holds. If E = A B is a measurable rectangle, then ν(e x ) = ν(b)χ A (x) and µ(e y ) = µ(a)χ B (y) are obviously measurable, and ν(e x ) dµ(x) = µ(a)ν(b) = µ(e y ) dν(y), hence E R. Since the measurable rectangles form an elementary family, the collection of finite disjoint unions of measurable rectangles forms an algebra, by Proposition 1.5. By the monotone class theorem 1.3, we may conclude R = S T if we show that R is a monotone class. Let E 1 E 2, E i R, and set E = E i. Then f i (x) := ν((e i ) x ) and g i (y) := µ((e i ) y ) are measurable functions satisfying f i f i+1, g i g i+1, f i (x) ν(e x ), and g i (y) µ(e y ) for all x and y, by Lemma 1.1. By the monotone convergence theorem 3.14, ν(e x ) dµ(x) = lim ν((e i ) x ) dµ(x) i Y Y

30 26 3. INTEGRATION = lim µ((e i ) i Y y ) dν(y) = µ(e y ) dν(y), Y thus E R. If E 1 E 2, E i R, then we may conclude in a similar way that E i R, using the dominated convergence theorem So R is a monotone class. If µ and ν are σ-finite, we can write Y as an increasing union of measurable rectangles i Y i with µ( i ) < and ν(y i ) <. For E S T, we may apply the preceding argument to each E ( i Y i ), χ i (x)ν(e x Y i ) dµ(x) = ν(e x Y i ) dµ(x) i = µ(e y i ) dν(y) = χ Yi (y)µ(e y i ) dν(y) Y i and conclude (3.3) from the monotone convergence theorem Let us prove that (µ ν)(e) := ν(e x) dµ(x) is a σ-finite measure on S T. σ-additivity follows from Corollary 3.15: If E i S T are disjoint, then (E i ) x T are disjoint, so, for E = E i, (( ) ) ( ) ν(e x ) = ν E i = ν (E i ) x = ν((e i ) x ) and thus (µ ν)(e) = x ν(e x ) dµ(x) = ν((e i ) x ) dµ(x) = (µ ν)(e i ). Clearly, the measure µ ν is σ-finite; indeed (µ ν)( i Y i ) = µ( i )ν(y i ) <. Theorem 3.27 (Fubini s theorem). Let (, S, µ) and (Y, T, ν) be σ-finite measure spaces, and let f be an (S T)-measurable function on Y. (1) If f, then the functions ϕ : [, ], ϕ(x) := ψ : Y [, ], ψ(y) := are measurable, and f d(µ ν) = Y Y ϕ dµ = (2) If f is complex valued and ϕ dµ <, where ϕ (x) := f x dν, f y dµ Y Y ψ dν. (3.4) f x dν, then f L 1 (µ ν). (3) If f L 1 (µ ν), then f x L 1 (ν) for µ-a.e. x, f y L 1 (µ) for ν-a.e. y Y, the a.e. defined functions ϕ and ψ are in L 1 (µ) and L 1 (ν), respectively, and (3.4) holds. The identity (3.4) may be written in the form ( ) f d(µ ν) = f(x, y) dν(y) dµ(x) = Y Y Y ( ) f(x, y) dµ(x) dν(y). The left most integral is called a double integral, the other two are called iterated integrals. The assertion in (1) is often referred to as Tonelli s theorem.

31 3.4. FUBINI S THEOREM 27 Proof. (1) The definitions of ϕ and ψ are meaningful by Lemma Theorem 3.26 implies (1) in the case that f = χ E for E S T, and thus (1) holds for all positive simple functions s. In the general case, there exists a sequence of simple functions s 1 s 2 such that s i (x, y) f(x, y) for all (x, y) Y, by Theorem 3.6. Then, if ϕ i (x) := (s i ) x dν, (3.5) we have ϕ i dµ = Y Y s i d(µ ν). (3.6) The monotone convergence theorem 3.14, applied to (3.5), implies that ϕ i (x) ϕ(x) for all x. Clearly, ϕ i ϕ i+1. Thus we may again apply the monotone convergence theorem to both sides of (3.6), and we obtain the first equality in (3.4). The other half of (3.4) follows similarly. (2) follows by applying (1) to f. (3) It is no restriction to assume that f L 1 (µ ν) is real valued. Then (1) applies to f + and f ; set ϕ ± (x) := Y (f ± ) x dν. As f ± f we may conclude that ϕ ± L 1 (µ). Thanks to f x = (f + ) x (f ) x we have f x L 1 (ν) for every x satisfying ϕ ± (x) <. Since ϕ ± L 1 (µ), this happens for µ-a.e. x; at any such x we have ϕ(x) = ϕ + (x) ϕ (x). Thus ϕ L 1 (µ). Now (3.4) holds for f ± and ϕ ± in place of f and ϕ. Subtracting the respective equalities yield the first equality of (3.4). The other half follows analogously. The following example shows that the theorem is not true if one of the measure spaces is not σ-finite. Example If = Y = [, 1], µ the Lebesgue measure, ν the counting measure, and f(x, y) = 1 for x = y and f(x, y) = otherwise, then f(x, y) dµ(x) = and f(x, y) dν(y) = 1 for all x, y [, 1] so that ( ) f(x, y) dν(y) dµ(x) = 1 = Y Y Y ( ) f(x, y) dµ(x) dν(y). The function f = χ {x=y} is (L([, 1]) P([, 1]))-measurable, since {x = y} = n=1 Q n where Q n = ([ n, 1 n ] [ n, 1 n 1 n ]) ([ n, n n ] [ n 1 n, n n ]). The product measure µ ν rarely is complete, even if µ and ν are complete. If A S is non-empty with µ(a) = and B Y so that B T, then A B A Y and (µ ν)(a Y ) =, but A B S T, by Lemma This applies in particular to the Lebesgue measure: λ 1 λ 1 λ 2. However the following is true. Theorem λ m+n is the completion of λ m λ n, for m, n 1. Proof. First we show that B(R m+n ) L(R m ) L(R n ) L(R m+n ). The first inclusion follows from the fact that each cube in R m+n belongs to L(R m ) L(R n ) and B(R m+n ) is the σ-algebra generated by the cubes in R m+n ; see Lemma 2.6. Suppose that E L(R m ) and F L(R n ). Then E R n and R m F belong to L(R m+n ), by Corollary 2.1, and thus E F = (E R n ) (R m F ) belongs to L(R m+n ), which implies the second inclusion. Both λ m+n and λ m λ n coincide on boxes and hence on B(R m+n ), by Theorem If A L(R m ) L(R n ), then A L(R m+n ) and so there exist B 1, B 2

32 28 3. INTEGRATION B(R m+n ) such that B 1 A B 2 and λ m+n (B 1 \ B 2 ) =, by Corollary 2.1. Consequently, (λ m λ n )(A \ B 2 ) (λ m λ n )(B 1 \ B 2 ) = λ m+n (B 1 \ B 2 ) =, and thus (λ m λ n )(A) = (λ m λ n )(B 2 ) = λ m+n (B 2 ) = λ m+n (A). So λ m+n and λ m λ n coincide on L(R m ) L(R n ) which implies the statement. Theorem 3.3 (Fubini s theorem for complete measures). Let (, S, µ) and (Y, T, ν) be complete σ-finite measure spaces, and let S T be the completion of S T with respect to µ ν. Let f be an S T-measurable function on Y. Then all conclusions of Theorem 3.27 hold, except that the T-measurability of f x can be asserted only for µ-a.e. x so that ϕ(x) is only defined µ-a.e., and similarly for f y and ψ. Proof. By Proposition 3.9, f = g+h, where h = µ ν-a.e. and g is (S T)- measurable. We claim that for µ-a.e. x we have h(x, y) = for ν-a.e. y Y and h x is T-measurable for µ-a.e. x. Similarly, for h y. Indeed, A := {(x, y) Y : h(x, y) } is a µ ν-null set. So there exists B S T such that A B and (µ ν)(b) =. By Theorem 3.26, ν(b x) dµ(x) = (µ ν)(b) =. By Proposition 3.21, µ(e) =, where E := {x : ν(b x ) > }. If x E, then ν(b x ) = and, as (Y, T, ν) is complete, each subset of A x ( B x ) belongs to T. If y A x, then h x (y) =. It follows that, for every x E, h x is T-measurable and h x (y) = ν-a.e. The claim is proved. Apply Theorem 3.27 to g. By the claim, f x = g x ν-a.e. for µ-a.e. x and f y = g y µ-a.e. for ν-a.e. y. Thus the two iterated integrals and the double integral of f are the same as those of g Transformation of measures and integrals Let (, S) and (Y, T) be measurable spaces and let f : Y be (S, T)- measurable. Given a measure µ on (, S) we may define the push-forward f µ on (Y, T) by f µ(e) := µ(f 1 (E)), E T. It is easy to check that f µ is a measure. Proposition Let g : Y C be T-measurable. Then g f L 1 (µ) if and only if g L 1 (f µ), and g d(f µ) = g f dµ. Y Proof. For E T and g = χ E the formula follows from χ E f = χ f 1 (E). So it holds for simple functions and hence for positive functions, by Theorem 3.6 and the monotone convergence theorem In particular, the equality holds for g instead of g, and so g f L 1 (µ) if and only if g L 1 (f µ). That it is also valid for complex valued g follows immediately. In the following we focus on the Lebesgue measure λ. Lemma Let A : R n R n be linear invertible, and let E be measurable. Then A(E) is measurable and λ(a(e)) = det A λ(e). In particular, λ is invariant under orthogonal transformations. Proof. It suffices to prove the statement for Borel sets E. Then null sets are invariant under A and A 1, and hence so are Lebesgue measurable sets.

33 3.5. TRANSFORMATION OF MEASURES AND INTEGRALS 29 If E is a Borel set then so is A(E), since χ A(E) = χ E A 1 and since χ E and A 1 and hence χ E A 1 are Borel mappings. We shall use translation invariance, see Proposition 2.13, and dilation invariance of λ 1 on Borel sets, i.e., if a R\{} and E B(R) then ae = {ax : x E} B(R) and λ 1 (ae) = a λ 1 (E). The collection of intervals in R is invariant under dilations, and hence so is B(R). Then µ a (E) := λ 1 (ae)/ a defines a Borel measure that coincides with λ 1 on boxes, and thus on all Borel sets, by Theorem Suppose that A, and thus also A 1, is upper triangular with all diagonal entries equal to 1. Then, λ(a(e)) = χ A(E) (x) dx = χ E (A 1 (x)) dx R n R n = χ E (x 1 + f 1 (x 2 ), x 2 + f 2 (x 3 ),..., x n ) dx 1 dx 2 R n 1 R = χ E (x 1, x 2 + f 2 (x 3 ),..., x n ) dx 1 dx 2, R n 1 R using Fubini s theorem 3.27 and translation invariance of λ 1. Repeating this procedure for the other variables, we find λ(a(e)) = χ E (x) dx = λ(e). R n Similarly, the assertion holds for lower triangular matrices with all diagonal entries equal to 1. If A = diag(a 1,..., a n ) is diagonal, then Fubini s theorem 3.27 and dilation invariance of λ 1 analogously imply λ(a(e)) = a 1 a n λ(e). An arbitrary square matrix A admits a decomposition A = LDU, where L (U) is an lower (upper) triangular matrix with all diagonal entries equal 1 and D is diagonal. Thus the result follows. Theorem 3.33 (Transformation formula). Let U, V R n be open and let f C 1 (U, V ) be bijective. If g is a measurable function on V, then g f is measurable on U. If g or g L 1 (V ), then g(f(x)) J f (x) dx = g(y) dy, U where J f = det( f/ x) is the Jacobi determinant of f. In particular, for measurable E U, f(e) is measurable, and λ(f(e)) = J f (x) dx. Proof. It is sufficient to consider Borel measurable functions and sets. Since f and f 1 are continuous, there are no measurability problems in this case. If g is Lebesgue measurable and B is a Borel set in C, then g 1 (B) = E N, where E is Borel and N is a null set. Moreover, f 1 (E) is Borel and f 1 (N) is a null set (by the result for Borel sets), and thus (g f) 1 (B) is Lebesgue measurable, i.e., g f is Lebesgue measurable. We use the norm x = max 1 i n x i for x R n and the matrix norm A = max 1 i n n j=1 A ij ; then Ax A x. Let Q = {x : x a h} be a cube contained in U. By the mean value theorem, f(x) f(a) = f (z)(x a) for some z on the segment between x and a, and hence, for x Q, E f(x) f(a) sup f (z) x a sup f (z) h. z Q z Q V

34 3 3. INTEGRATION So f(q) is contained in a cube of side length sup z Q f (z) times the side length of Q, thus λ(f(q)) ( sup f (z) ) n λ(q). z Q If A : R n R n is linear invertible, we find, by Lemma 3.32, λ(f(q)) = det A λ(a 1 f(q)) det A ( sup A 1 f (z) ) n λ(q). z Q Since f is uniformly continuous on Q, for each ɛ > there exists δ > so that for x, y Q with x y δ, f (x) 1 f (y) = f (x) 1 f (y) f (x) 1 f (x) + Id 1 + ɛ. By decomposing Q into subcubes Q 1,..., Q N with side length δ and centers x 1,..., x N, we may conclude N λ(f(q)) λ(f(q i )) N (1 + ɛ) n ( f J f (x i ) sup (x i ) 1 f nλ(qi (z) ) ) z Q i N J f (x i ) λ(q i ). Note that N J f (x i ) χ Qi is a simple function which tends uniformly on Q to x J f (x) as δ, by continuity of x J f (x). Letting δ and ɛ approach implies λ(f(q)) J f (x) dx. Q We shall show that this estimate holds with Q replaced by any Borel set in U. If Ω U is open, then Ω = Q i is a almost disjoint union of cubes Q i, by Lemma 2.6, and thus λ(f(ω)) λ(f(q i )) J f (x) dx = J f (x) dx. Q i Ω If E U is a Borel set of finite measure, then by outer regularity, Theorem 2.9, there exists a sequence U Ω i Ω i+1 E of open sets Ω i of finite measure so that λ( Ω i \ E) =. By Lemma 1.1 and the dominated convergence theorem 3.22, ( ( )) λ(f(e)) λ f Ω i lim λ(f(ω i )) lim J f (x) dx = J f (x) dx. i i Ω i Since λ is σ-finite, the estimate holds for all Borel sets E. We may infer that g(y) dy g(f(x)) J f (x) dx, f(u) U first for positive simple g and, by Theorem 3.6 and the monotone convergence theorem 3.14, for positive measurable g. Applying this to f 1 and (g f) J f instead of f and g, we get (g f)(x) J f (x) dx g(x) J f (f 1 (x)) J f 1(x) dx = g(y) dy. U f(u) So the assertion is shown for g, and the case g L 1 (V ) follows easily. The second statement is the special case, where g = χ f(e). E f(u)

35 3.5. TRANSFORMATION OF MEASURES AND INTEGRALS 31 Let S n 1 = {x R n : x = 1} denote the unit sphere in R n. The mapping ϕ : R n \ {} (, ) S n 1 : x ( x, x/ x ) defines a diffeomorphism with inverse (r, y) ry; we call (r, y) = ϕ(x) the polar coordinates of x. Let ρ be the measure on (, ) defined by ρ(e) = E rn 1 dr. Theorem 3.34 (Polar coordinates). There is a unique Borel measure σ on S n 1 such that ϕ λ = ρ σ. If f is Borel measurable on R n and f or f L 1 (λ), then f(x) dx = f(ry)r n 1 dσ(y) dr. R n (, ) S n 1 Proof. By Proposition 3.31 and Fubini s theorem 3.27, it suffices to show that there is a unique Borel measure σ on S n 1 such that ϕ λ = ρ σ. For Borel sets E in S n 1 we define σ(e) := nλ(ϕ 1 ((, 1] E)), which is a Borel measure on S n 1, since the mapping E ϕ 1 ((, 1] E) maps Borel sets to Borel sets and commutes with unions, intersections, and complements. For a >, we have by Lemma 3.32, ϕ λ((, a] E) = λ(ϕ 1 ((, a] E)) = a n λ(ϕ 1 ((, 1] E)) = an σ(e) = ρ((, a])σ(e) = (ρ σ)((, a] E). n As an immediate consequence, ϕ λ = ρ σ holds on sets of the form (a, b] E. For N N and a fixed Borel set E S n 1, the collection A N,E of finite disjoint unions of sets of the form (a, b] E, where b N, forms an algebra on (, N] E, by Proposition 1.5, that generates the σ-algebra S N,E = {A E : A B((, N])}. By Theorem 1.4, ϕ λ = ρ σ holds on S N,E, and since all Borel rectangles in (, ) S n 1 are disjoint countable unions of sets in N N,E B(S n 1 ) S N,E, we have ϕ λ = ρ σ on all Borel set, again by Theorem 1.4. The formula of the previous theorem can be extended to Lebesgue measurable functions by considering the completion of σ. If f(x) = g( x ) it gives f(x) dx = σ(s n 1 ) g(r)r n 1 dr. (3.7) R n (, ) Example 3.35 (Integral of a Gaussian function). We have ( 2 π ) n/2, dx = a >. a R n e a x If we denote the integral on the left by I n, then I n = (I 1 ) n by Fubini s theorem By (3.7), I 2 = 2π re ar2 dr = π (, ) a e ar2 = π a. Thus I 1 = (π/a) 1/2 and I n = (π/a) n/2. Example 3.36 (Volume and surface area of the unit ball). If B n := {x R n : x 1} denotes the closed unit ball in R n, then σ(s n 1 ) = 2πn/2 Γ(n/2) and λ(b n ) = By Example 3.35, (3.7), and Theorem 3.33, π n/2 2 = dx = σ(s n 1 ) R n e x (, ) π n/2 Γ(n/2 + 1). r n 1 e r2 dr

36 32 3. INTEGRATION = σ(sn 1 ) 2 and by the definition of σ, (, ) λ(b n ) = σ(sn 1 ) n = t n/2 1 e t dt = σ(sn 1 ) Γ(n/2), 2 π n/2 n/2 Γ(n/2) = π n/2 Γ(n/2 + 1) Integrals depending on parameters We study continuity and differentiability of functions of the form F (y) = f(x, y) dµ(x), y Y. Theorem 3.37 (Continuity of integrals depending on parameters). Let (, S, µ) be a measure space, let Y be a metric space, and let f : Y C be a function. Assume that: (1) For each fixed y Y the function x f(x, y) is measurable. (2) For each fixed x the function Y y f(x, y) is continuous at y. (3) There is a positive function g L 1 (µ) so that f(x, y) g(x) for all (x, y) Y. Then the function F : Y C given by F (y) = f(x, y) dµ(x), y Y, is well-defined and continuous at y. Proof. The function F is well-defined by (1) and (3). Let y k Y by a sequence converging to y, and consider the sequence of functions f k : C given by f k (x) := f(x, y k ). By (2), f k (x) f(x, y ) for every x, and, by (3), f k g for all k. The dominated convergence theorem 3.22 implies lim F (y k) = lim f k dµ = f(x, y ) dµ(x) = F (y ). k k Theorem 3.38 (Differentiability of integrals depending on parameters). Let (, S, µ) be a measure space, let Y be open in R n, and let f : Y C be a function. Assume that: (1) For each fixed x the function Y y f(x, y) is C 1. (2) For each fixed y Y the function x f(x, y) is in L 1 (µ), and x y i f(x, y), i = 1,..., n, is measurable. (3) There is a positive function g L 1 (µ) so that y i f(x, y) g(x) for all (x, y) Y. Then the function F : Y C given by F (y) = f(x, y) dµ(x), y Y, is well-defined and C 1 with y i F (y) = y i f(x, y) dµ(x).

37 3.6. INTEGRALS DEPENDING ON PARAMETERS 33 Proof. The function F is well-defined by (2). Let y Y and let the open ball B r (y ) be contained in Y. Let h k R \ {} with h k and such that y k := y + h k e i B r (y ), where e i is the ith standard unit vector in R n. Set ϕ k (x) := f(x, y k) f(x, y ) h k. Then each ϕ k is in L 1 (µ), and, for all x, lim ϕ k(x) = f(x, y ). k y i By (3) and the mean value theorem, ϕ k g. The dominated convergence theorem 3.22 implies that x y i f(x, y ) is in L 1 (µ) and we have lim ϕ k dµ = f(x, y ) dµ(x). k y i Sincḙ ϕ k dµ = 1 h k ( ) f(x, y k ) dµ(x) f(x, y ) dµ(x) = F (y k) F (y ), h k we see that y i F (y ) exists and equals y i f(x, y ) dµ(x). The continuity of follows from Theorem y i F Theorem 3.39 (Holomorphy of integrals depending on parameters). Let (, S, µ) be a measure space, let Y be open in C, and let f : Y C be a function. Assume that: (1) For each fixed x the function Y y f(x, y) is holomorphic. (2) For each fixed y Y the function x f(x, y) is measurable. (3) There is a positive function g L 1 (µ) so that f(x, y) g(x) for all (x, y) Y. Then the function F : Y C given by F (y) = f(x, y) dµ(x), y Y, is well-defined and holomorphic with F (y) = y f(x, y) dµ(x). Proof. The function F is well-defined by (2) and (3). Let y Y and let B r (y ) be contained in Y. For all y B r (y ) and all x, we have y f(x, y) = 1 f(x, z) 2πi (z y) 2 dz B r(y ) and thus, if we write y = y 1 + iy 2 and use (3), for all y B r/2 (y ) and all x f(x, z) yi f(x, y) r max z B r(y ) z y 2 4g(x). r By Theorem 3.38, F is C 1 in B r/2 (y ) and satisfies the Cauchy Riemann equations y1 F (y) + i y2 F (y) = y1 f(x, y) + i y1 f(x, y) dµ(x) =. The proof is complete.

38 34 3. INTEGRATION 3.7. Relation to the Riemann integral Let [a, b] be a compact interval and let f : [a, b] R be bounded. For each partition P of [a, b], i.e., a finite sequence P = (t i ) n i= with a = t < t 1 < < t n = b, define n U P f := sup f(t) (t i t i 1 ), t i 1 t t i and set L P f := n inf f(t) (t i t i 1 ), t i 1 t t i I b a(f) := inf U P f, I b a (f) := sup L P f, P where P varies over all partitions of [a, b]. If I b a(f) = I b a (f) then their common value is the Riemann integral b f(x) dx, and f is called Riemann integrable. a Theorem 3.4. Let f : [a, b] R be bounded. Then: (1) If f is Riemann integrable, then f is Lebesgue measurable and thus integrable (since bounded), and b f(x)dx = f dλ. a [a,b] (2) f is Riemann integrable if and only if λ({t [a, b] : f is discontinuous at t}) =. The second result is Lebesgue s criterion for Riemann integrability. Proof. (1) Without loss of generality assume that f. For each partition P of [a, b] set n G P := sup f(t) χ (ti 1,t i], t i 1 t t i g P := n inf f(t) χ (ti 1,t t i 1 t t i i], such that U P f = G P dλ and L P f = g P dλ. If f is Riemann integrable, there exists a sequence of partitions P k whose mesh size (that is max i (t i t i 1 )) tends to, such that P k P k+1, and so that U Pk f and L Pk f converge to b f(x) dx. Then a G Pk G Pk+1 f g Pk+1 g Pk, and G := lim k G Pk, g := lim k g Pk satisfy g f G. By the dominated convergence theorem 3.22, b g dλ = f(x) dx = G dλ, a and thus (G g) dλ =. By Proposition 3.21, G = g = f a.e. Since G is measurable, by Proposition 3.4, so is f, by Proposition 3.8 (as λ is complete), and we have b f(x) dx = G dλ = f dλ. set a (2) Assume that f is Riemann integrable. By the first part of the proof, the P [a,b] E := {t [a, b] : g(t) G(t)} has measure zero. We will show that the set of discontinuities of f lies in E. Fix t [a, b] \ E and ɛ >. Then g(t ) = G(t ) and hence G Pk (t ) g Pk (t ) ɛ for k=1 P k

39 3.8. HAUSDORFF MEASURE 35 k sufficiently large. Since t P k, G Pk and g Pk are constant near t. Thus there is δ > so that for t t δ, f(t) f(t ) G Pk (t) g Pk (t ) = G Pk (t ) g Pk (t ) ɛ, and similarly f(t) f(t ) ɛ. This implies that f is continuous at t. Conversely, let f be continuous except on a set E of measure zero. By Theorem 2.9, given ɛ > we may find open intervals I i so that E I i and I i ɛ/(4m), where M = sup t [a,b] f(t). If f is continuous at t, then there is an open interval J t t such that f(s) f(r) ɛ/2(b a) for s, r J t [a, b]. The open cover {I i } {J t : t [a, b] \ E} of [a, b] has a finite subcover; let P = (t i ) n i= be the partition of [a, b] given by the endpoints (inside [a, b]) of the intervals in this subcover. Let L = {l : (t l 1, t l ) I i for some i}. Then n U P f L P f = sup (f(t) f(s)) (t i t i 1 ) t i 1 s,t t i 2M (t i t i 1 ) + ɛ 2(b a) (t i t i 1 ) i L i L 2M ɛ 4M + ɛ (b a) = ɛ. 2(b a) This implies that f is Riemann integrable. The proper Riemann integral is thus subsumed in the Lebesgue integral. The latter allows for integration of a wider class of functions. For instance, χ Q [,1] is discontinuous everywhere and hence not Riemann integrable. It is however Lebesgue integrable with χ Q [,1] dλ =. For improper Riemann integrals the situation is different. The functions f = ( 1) k k=1 k χ (k,k+1] or g(x) = sin(x)/x have improper Riemann integrals over [1, ) (to see this for g use partial integration and the majorant criterion), but they are not Lebesgue integrable. A Lebesgue integrable function on [a, ) that is Riemann integrable on [a, b], for each b > a, has absolutely convergent improper Riemann integral and b f dλ = lim f(x) dx. (3.8) [a, ) b a b Indeed, for each b > a, f(x) dx = f dλ f dλ and hence a [a,b] [a, ) b lim b a f(x) dx exists. Moreover, choose a sequence b k and set f k := fχ [a,bk ]. Then the dominated convergence theorem 3.22 implies (3.8) Hausdorff measure In this section we consider the d-dimensional Hausdorff measure in R n. It allows for a definition of d-dimensional area in an intrinsic way, i.e., without reference to parameterizations. Moreover, it makes sense in any metric space and even for non-integer d. For d let us set π d/2 ω d := Γ(d/2 + 1), where Γ(t) := s t 1 e s ds is the Gamma function. If d 1 is an integer, then ω d is the d-dimensional Lebesgue measure of the unit ball in R d ; see Example 3.36.

40 36 3. INTEGRATION Let E R n be any subset. The d-dimensional Hausdorff measure of E is given by where for < ɛ, H d (E) := lim ɛ + Hd ɛ (E), (3.9) Hɛ d (E) := ω { d 2 d inf (diam(e i )) d : diam(e i ) < ɛ, E } E i i i for countable covers {E i } i of E and with the convention diam( ) =. Note that the limit in (3.9) exits (finite or infinite), since ɛ Hɛ d (E) is decreasing, and that H is the counting measure. It is possible to restrict the E i in the definition to closed (or open) and convex sets such that E i E, but further restrictions produce other outer measures, e.g., using only balls yields the so-called spherical Hausdorff measure. The definition of Hausdorff measure extends to any metric space. It depends on the metric but not on the ambient space, i.e., H d (E) = Hd Y (E) whenever E and the metric space is isometrically embedded in the metric space Y. Proposition Let d, n N. (1) H d is an outer measure on R n and a measure on B(R n ). (2) For each E R n, z R n, and a >, H d (E + z) = H d (E), H d (ae) = a d H d (E). (3) H d = if d > n. (4) If d > d, then H d (E) > implies H d (E) =. (5) If f : R n R m is a Lipschitz function with Lipschitz constant Lip(f), then H d (f(e)) Lip(f) d H d (E). Proof. (1) Let us show that H d is σ-subadditive; monotony is obvious. It is easy to see that each Hɛ d is σ-subadditive. Thus H d is σ-subadditive, since the supremum of σ-subadditive set functions is σ-subadditive. So H d is an outer measure on R n. Suppose that δ = dist(e 1, E 2 ) > and ɛ δ. Then any set of diameter < ɛ intersecting E 1 E 2 is intersecting only one of the sets E 1, E 2. Hence, Hɛ d (E 1 E 2 ) Hɛ d (E 1 ) + Hɛ d (E 2 ). Since Hɛ d is σ-subadditive, we obtain Hɛ d (E 1 E 2 ) = Hɛ d (E 1 ) + Hɛ d (E 2 ), and letting ɛ, H d (E 1 E 2 ) = H d (E 1 ) + H d (E 2 ). The proof of Theorem 2.1(3) shows that all closed sets, and hence all Borel sets, are H d -measurable. (2) This follows from diam(e+z) d = diam(e) d and diam(ae) d = a d diam(e) d. (3) Let d > n. Any cube Q of side length 1 can be covered by k n closed cubes of side length 1/k. Thus, Hɛ d (Q) ω d ( n/2) d k n d for ɛ > n/k. Letting k implies H d (Q) =. The assertion now follows from translation invariance and σ-subadditivity. (4) We have (diam(e i )/ɛ) d (diam(e i )/ɛ) d if diam(e i ) < ɛ. Thus for < ɛ <, 2 d Hɛ d (E) ɛ d d 2 d Hɛ d (E) ω d ω d which implies the statement. (5) follows from diam(f(e)) Lip(f) diam(e).

41 3.8. HAUSDORFF MEASURE 37 Note that H d is not σ-finite if d < n. The Hausdorff dimension of a subset E R n is defined by Then, by Proposition 3.41, dim H E := inf{d : H d (E) = }. H d (E) = { if d < dim H E, if d > dim H E. Finite sets have Hausdorff dimension. But there also exist compact uncountable sets with Hausdorff dimension. Example 3.42 (Hausdorff dimension of the Cantor set). Let C = k= C k be the Cantor set; see Example 2.3. Recall that C k is a disjoint union of 2 k closed intervals with length 3 k. Thus H d 3 k(c) ω d 2 k 2 d 3 kd. This bound remains bounded as k provided that 2/3 d 1. So for the choice d = log 2 log 3 (3.1) we have H d (C) = lim k H d 3 (C) < and hence dim k H C d. To conclude that the Hausdorff dimension of the Cantor set C is d = log 2/ log 3, we need to show that H d (C) >. To this end we prove that j diam(i j) d 1/4 whenever {I j } is a cover of C by open intervals. Since C is compact, we may assume that I 1,..., I n cover C. As the interior of C is empty, we may also assume that the endpoints of each I j lie outside of C (making the I j slightly larger if necessary). Let δ > be the distance between C and the set of all endpoints of intervals I j, and choose a positive integer k such that 3 k < δ. Then each connected component C k,i of C k is contained in some I j. We assert that, for each open interval I and each fixed l, diam(c l,i ) d 4 diam(i) d. (3.11) C l,i I This will imply the strived for inequality, 4 j diam(i j ) d j diam(c k,i ) d C k,i I j 2 k diam(c k,i ) d = 1, since diam(c k,i ) d = 3 kd = 2 k by (3.1). Let us show (3.11). If m denotes the least integer for which I contains some C m,i, then m l. There are at most 4 connected components C m,i1,..., C m,ip of C m which intersect I; otherwise m would not be minimal. Thus, p p diam(c l,i ) d diam(c l,i ) d = diam(c m,iq ) d 4 diam(i) d. C l,i I q=1 C l,i C m,iq q=1 because C l,i C m,iq diam(c l,i ) d = 2 l m 3 ld = 2 l m 2 l = diam(c m,iq ) d. Theorem 3.43 (Isodiametric inequality). For every Lebesgue measurable set E R n, ( diam(e) ) n. λ n (E) ω n (3.12) 2

42 38 3. INTEGRATION Proof. For v S n 1 let π v be the hyperplane perpendicular to v, and for w π v set E v,w := {t R : w + tv E}. Consider the symmetriced set S v (E) := {w + tv : w π v, 2 t λ 1 (E v,w )}. By Fubini s theorem 3.27, we may conclude that the mapping π v w λ 1 (E v,w ) is L(R n 1 )-measurable where π v = R n 1, and hence S v (E) is Lebesgue measurable and λ n (S v (E)) = λ n (E). We have diam(s v (E)) diam(e) thanks to the easy inequality λ 1 (I) + λ 1 (J) 2 sup{ t s : t I, s J} for I, J B(R). If E is symmetric with respect to a direction orthogonal to v, then so is S v (E). Define iteratively E := E and E i := S ei (E i 1 ), where e 1,..., e n denote the standard unit vectors in R n. Then E n is Lebesgue measurable, satisfies λ n (E n ) = λ n (E ), diam(e n ) diam(e), and is invariant under the mapping x x. Hence E n is contained in the closed ball with radius diam(e)/2, and λ n (E) λ n (E ) = λ n (E n ) ω n ( diam(e) 2 This argument is called Steiner symmetrization. Theorem For every Borel set E R n and every ɛ (, ], λ n (E) = H n ɛ (E) = H n (E). ) n. Proof. Let us prove λ n (E) Hɛ n (E). Let (E i ) i be a cover of E by closed sets with diam(e i ) < ɛ. Then, by the isodiametric inequality (3.12), λ n (E) λ n (E i ) ω n 2 n (diam(e i )) n. i i We may conclude λ n (E) Hɛ n (E), since the cover (E i ) i was arbitrary. Note that H n is finite on bounded sets; use the argument in the proof of Proposition 3.41(3). Hence H n is a translation invariant Radon measure on R n. By Theorem 2.14, there is a constant C > such that λ n (E) = C H n (E) for all Borel sets E R n. It remains to show that C = 1. If B is the unit ball in R n, then λ n (B) H n ɛ (B) H n (B) = C 1 λ n (B), whence C 1. On the other hand, for all ɛ, H n ɛ (B) λ n (B) = C H n (B) and thus C 1. In order to see the inequality Hɛ n (B) λ n (B) note that it is possible to find a collection of disjoint closed balls B 1, B 2,... with diam(b i ) < ɛ such that B i B and λ n (B \ B i) = ; this is a consequence of the Besicovitch Vitali covering theorem, cf. [3]. Then H n ɛ ( ) B i ω n 2 n (diam(b i )) n = λ n (B i ) = λ n( ) B i = λ n (B). We may conclude that Hɛ n (B) λ n (B), since a λ n -null set is also a Hɛ n -null set. In fact, for every cube Q R n, we have ω n (diam(q)/2) n = ω n ( n/2) n λ n (Q) and thus Hɛ n (E) ω { n 2 n inf } Q i i (diam(q i )) n : Q i cubes, diam(q i ) < ɛ, E i ( n ) n = ω n λ n (E). 2

43 CHAPTER 4 L p -spaces Let (, S, µ) be a measure space. For 1 p <, we set 4.1. Definition of L p -spaces L p (µ) := {f : C : f is measurable and f p L 1 (µ)}. We shall also use the notation L p () if there is no ambiguity. Note that f + g p 2 p max( f, g ) p = 2 p max( f p, g p ) 2 p ( f p + g p ) which implies that L p (µ) is a vector space. For f L p () we define ( 1/p. f p := f dµ) p For p =, we set L (µ) := {f : C : f is measurable and For f L () we define the essential supremum M R : f(x) M for µ-a.e. x }. f := inf{m : f(x) M for µ-a.e. x }. We shall see below that f p, 1 p, defines a norm on (equivalence classes of functions in) L p (µ); it is called the L p -norm; we will also use L p (µ) or L p (). If A is a nonempty set, we denote by l p (A) the space L p (µ), where µ is the counting measure on (A, P(A)). By Proposition 3.21, for a measurable function f, f p = if and only if f = µ-a.e. So p is not a norm on L p (µ) as defined above. For this reason we redefine L p (µ): The equivalence relation f g : f = g µ-a.e. partitions L p (µ) into equivalence classes. The L p -norm is constant on every equivalence class. Henceforth we use the symbol L p (µ) for the vector space of equivalence classes of measurable functions whose L p -norm is finite. For the sake of simplicity, we will nevertheless speak of L p -functions. However, one should keep in mind that it makes no sense to ask for the value of an L p -function at some particular point Inequalities Recall that a real valued function ϕ defined on an open interval (a, b) is called convex if, for x, y (a, b), ϕ((1 t)x + ty) (1 t)ϕ(x) + tϕ(y), < t < 1, and strictly convex if the inequality is strict. Setting z = (1 t)x + ty we obtain ϕ(z) ϕ(x) z x ϕ(y) ϕ(x) y x 39 ϕ(y) ϕ(z), x < z < y. (4.1) y z

44 4 4. L P -SPACES with strict inequalities if ϕ is strictly convex. The inequalities in (4.1) imply that the one-sided derivatives ϕ ±(x) of ϕ exist in R at every x (a, b); indeed, the difference quotients δ(x, y) := ϕ(y) ϕ(x) y x satisfy δ(x, y) δ(x, z) for x < z < y and are bounded from below by δ(w, x) for some w < x, thus ϕ +(x) = lim y x + δ(x, y). As a consequence ϕ is continuous. Theorem 4.1 (Jensen s inequality). Let (, S, µ) be a measure space with µ() = 1. If f L 1 (µ) is real valued and f() (a, b) (a = and b = are allowed), and if ϕ : (a, b) R is convex, then ( ) ϕ f dµ ϕ f dµ. Proof. Since µ() = 1, we have a < z := f dµ < b. By (4.1), and therefore ϕ(z) ϕ(x) α := sup x<z z x ϕ(w) ϕ(z) + α(w z), ϕ(y) ϕ(z), for all y (z, b), y z for all w (a, b). In particular, ϕ(f(x)) ϕ(z) + α(f(x) z) for all x. Since ϕ is continuous, ϕ f is measurable, and integrating the last inequality yields ( ) ϕ f dµ ϕ(z) + α( f dµ z) = ϕ f dµ. A pair of positive real numbers p and q are called conjugate exponents if we regard also 1 and to be conjugate. 1 p + 1 q = 1; Theorem 4.2 (Hölder s inequality). Let p and q be conjugate exponents, 1 p. Let f L p (µ) and g L q (µ). Then fg L 1 (µ), and fg 1 f p g q. If p = q = 2 this is also called Schwarz inequality. Proof. For p = 1 this follows easily from the definition of the integral. Let us assume that 1 < p <. Set A := {x : g(x) > } and ν(e) := E g q dµ, for E S. Since g L q (µ), we have ν(a) = ν() = g q q <. By Corollary 3.16, γ := ν/ν(a) is a probability measure on A. By Jensen s inequality 4.1 and since (1 q)p = q, 1 ν(a) p fg p 1 = A 1 q g q f g ν(a) dµ p = ( f g 1 q ) p dγ = A A = 1 f p dµ = 1 ν(a) ν(a) f p p and hence fg 1 f p ν(a) 1 1/p = f p g q. A A f g 1 q dγ f p q g q g ν(a) dµ Corollary 4.3. If f i L pi (µ) and n 1/p i = 1/p for p, p i [1, ], then n p n f i f i pi. p

45 4.2. INEQUALITIES 41 Proof. If p = then p i = for all i and the inequality is obvious. So assume that p <. If p i = for some i, the result can be reduced to that case that all p i <. So let us make this assumption. If n = 2, we have, as 1 = 1/(p 1 /p) + 1/(p 2 /p), ( ) f 1 f 2 p p/p1 ( p/p2 dµ f 1 p1 dµ f 2 dµ) p2 = f1 p p f 1 2 p p. 2 In the general case, define q by 1/q = n i=2 1/p i, and use induction: n p n q n f i f 1 p1 f i f i pi. i=2 Proposition 4.4. Let p and q be conjugate exponents, 1 p. If p = we assume that µ has the finite subset property. Then for every f L p (µ), f p = fg dµ = sup fg dµ. sup g L q (µ) g q 1 g L q (µ) g q 1 Proof. The identities are clear if f =. So let us assume that f p >. By Hölder s inequality 4.2, for each g L q (µ) with g q 1, fg dµ fg dµ f p, hence sup fg dµ sup fg dµ f p. It remains to prove that f p sup fg dµ. Consider first the case that p [1, ). Set h(x) := f(x) p 2 f(x) if f(x) and h(x) := if f(x) =, then fh = f p. If p > 1, then h q = f p and hence g := f p/q p h satisfies g q = 1 and fg dµ = f p. If p = 1, then h = 1 and fh dµ = f 1. If p =, choose < m < f and set A m := {x : f(x) m}. Then µ(a m ) >. Since µ has the finite subset property, there exists B m A m with < µ(b m ) <. The function ϕ := χ { f(x) =} + χ { f(x) >} f f 1 is measurable and satisfies ϕ = 1 and f = ϕ f. Thus g m := χ Bm /(ϕµ(b m )) satisfies g m 1 = 1 and fg m dµ = 1 µ(b m) B m f dµ m, and thus { } sup fg dµ : g L 1 (µ), g 1 1 m. Letting m f finishes the proof. Theorem 4.5 (Minkowski s integral inequality). Let (, S, µ) and (Y, T, ν) be σ- finite measure spaces, let f : Y [, ] be (S T)-measurable, and let 1 p. Then f(, y) dν(y) f(, y) Lp (µ) dν(y). Lp (µ) Y Y Proof. It follows from Fubini s theorem 3.27 that the function h(x) := f(x, y) dν(y), x, is measurable. Furthermore, by Proposition 4.4, f(, y) dν(y) = h L p (µ) Y Lp (µ) { } = sup hg dµ : g L q (µ), g L q (µ) 1 { } = sup f(x, y) g(x) dν(y) dµ(x) : g L q (µ), g Lq (µ) 1 Y Y

46 42 4. L P -SPACES { = sup Y { sup Y = Y } f(x, y) g(x) dµ(x) dν(y) : g L q (µ), g L q (µ) 1 } f(x, y) g(x) dµ(x) : g L q (µ), g Lq (µ) 1 dν(y) f(, y) Lp (µ) dν(y). Corollary 4.6 (Minkowski s inequality). Let 1 p. For f 1, f 2 L p (µ), f 1 + f 2 p f 1 p + f 2 p. It follows that p is a norm on L p (µ). Proof. As f 1 + f 2 p dµ f 1 + f 2 p dµ, we may assume without loss of generality that f 1, f 2 are nonnegative. Then Minkowski s inequality follows from Minkowski s integral inequality 4.5 if we let Y be the two point set {1, 2} with the counting measure. Note that in this case the use of Fubini s theorem in the proof of Theorem 4.5 reduces to linearity of the integral, and hence it is not necessary to assume σ-finiteness: if f(x, 1) = f 1 (x) and f(x, 2) = f 2 (x), then f(x, y) g(x) dν(y) dµ(x) Y = f 1 (x) g(x) + f 2 (x) g(x) dµ(x) = f 1 (x) g(x) dµ(x) + f 2 (x) g(x) dµ(x) = f(x, y) g(x) dµ(x) dν(y). Y In general L p (µ) L q (µ) for all p q; consider x a, a >, on (, ) with the Lebesgue measure. However, we have the following results; see also Section 7.3 on interpolation of L p -spaces. Proposition 4.7 (Inclusion relations). If 1 p < q < r, then and L p (µ) L r (µ) L q (µ) L p (µ) + L r (µ), f q f t p f 1 t r, where 1 q = t p + 1 t. r Proof. Let us first prove L q (µ) L p (µ) + L r (µ). For f L q (µ) set E := {x : f(x) > 1} and decompose f = fχ E + fχ E c. This shows the asserted inclusion, since fχ E p = f p χ E f q χ E, thus fχ E L p (µ), and fχ E c r = f r χ E c f q χ E c, thus fχ E c L r (µ); for r = we clearly have fχ E c 1. Now we turn to the other inclusion. Consider first the case r <. By assumption, p/(tq) and r/((1 t)q) are conjugate, and so by Hölder s inequality 4.2, f q dµ = f tq f (1 t)q dµ f tq p/(tq) f (1 t)q r/((1 t)q) ( = ) tq/p ( f p dµ ) (1 t)q/r f r dµ = f tq p f r (1 t)q. If r =, then t = p/q and f q dµ = f q p f p dµ f q p f p p

47 4.3. COMPLETENESS 43 which implies the assertion. Corollary 4.8. If A is any set and 1 p < q, then l p (A) l q (A) and f q f p. Proof. Obviously, f f p. For q <, setting r = and t = p/q in Proposition 4.7 implies f q f p/q p f 1 p/q f p/q p f 1 p/q p = f p. Proposition 4.9. If µ() < and 1 p < q, then L q (µ) L p (µ) and f p µ() 1/p 1/q f q. Proof. By Hölders inequality 4.2, for 1/r + 1/r = 1, f p 1 1 r f p r = µ() 1/r ( f pr dµ and thus f p µ() 1/pr f pr. Setting r = q/p gives the assertion. ) 1/r 4.3. Completeness Let 1 p. The normed space (L p (µ), p ) comes with a natural notion of convergence. A sequence (f i ) in L p (µ) is called (strongly) convergent if there exists an element f L p (µ) such that f i f p as i. A sequence (f i ) in L p (µ) is a Cauchy sequence if for all ɛ > there is k N so that f i f j p < ɛ if i, j k. Recall that a normed space is complete if each Cauchy sequence is convergent. Theorem 4.1 (Riesz Fischer). Let 1 p. The space L p (µ) is complete and hence a Banach space. Proof. Let 1 p <. Let (f i ) be a Cauchy sequence in L p (µ). Choose i 1 such that f i1 f j p < 1/2 for j i 1, choose i 2 > i 1 such that f i2 f j p < 1/2 2 for j i 2, etc. In this way we obtain a subsequence (f ik ) such that f ik f ik+1 p < 1/2 k for all k 1. Let us define F := f i1 + f ik+1 f ik. k=1 Then F is an element of L p (µ), by the monotone convergence theorem 3.14, since, for all m 1, m f i1 + f ik+1 f ik f i1 p + 1. p k=1 In particular, F (x) is finite for µ-a.e. x, and for such x the series f i1 (x) + k=1 f i k+1 (x) f i k (x) is absolutely convergent, and thus the sequence of partial sums f i1 (x) + m f ik+1 (x) f ik (x) = f im+1 (x) k=1 converges to some number f(x). Since f ik (x) F (x) and F L p (µ), the dominated convergence theorem 3.22 implies that f L p (µ), and in turn that

48 44 4. L P -SPACES f ik f p as k, since f ik f p and f ik f p (2F ) p µ-a.e. That f i f p as i follows from f i f p f i f ik p + f ik f p. Let (f i ) be a Cauchy sequence in L (µ). The sets E i = {x : f i (x) > f i } and E jk = {x : f j (x) f k (x) > f j f k } and thus also their union E for all i, j, k N have measure zero. On E c the sequence f i converges uniformly to a bounded function f. Extending f by on E we obtain a measurable bounded function satisfying f i f. (In more details: clearly, f i converges pointwise to a function f on E c. To see uniform convergence, let, for given ɛ >, k be such that sup x E c f i (x) f j (x) < ɛ/2 for i, j k, and for x E c choose i x k such that f(x) f ix (x) < ɛ/2. Then f(x) f j (x) f(x) f ix (x) + f ix (x) f j (x) < ɛ for j k, independently of x. In particular, f(x) f(x) f k (x) + f k (x) ɛ + sup x E c f k (x) for all x E c, i.e., f is bounded.) Corollary Let 1 p. Any Cauchy sequence in L p (µ) has a subsequence that converges pointwise µ-a.e. Proof. This was shown in the proof of Theorem 4.1; see also Proposition 4.24 and Theorem Corollary L 2 (µ) is a Hilbert space with inner product f, g = fg dµ. Proof. f, g is well-defined by Hölder s inequality 4.2 and it is easy to see that it defines an inner product on L 2 (µ). Since f 2 = f, f 1/2, the completeness follows from Theorem Convolution and approximation by smooth functions We will see in this section that L p -functions on open subsets of R n can be approximated by nicer functions if 1 p <. We start with the following proposition. Proposition Let S denote the class of all simple functions s on satisfying µ({x : s(x) }) <. If 1 p <, then S is dense in L p (µ). Proof. Clearly, S L p (µ). Let f L p (µ), f. By Theorem 3.6, there exist simple functions s 1 s 2 f so that s i (x) f(x) for µ-a.e. x. Thanks to s i f we have µ({x : s i (x) }) <, i.e., s i S. Since f s i p f p, the dominated convergence theorem 3.22 implies that f s i p. The general complex case follows immediately. For the rest of the section let be an open subset of R n equipped with the Lebesgue measure λ; we shall write L p () instead of L p (λ) and f dx instead of f dλ. Theorem 4.14 (Approximation by continuous functions). For 1 p <, the class C c () of continuous functions with compact support in is dense in L p (). Proof. By Proposition 4.13, it suffices to show that, for each measurable E with λ(e) <, χ E is the L p -limit of a sequence of functions in C c (). Since λ is regular, see Theorem 2.9, for given ɛ > there exist an open set U and a compact set K such that K E U, λ(e) < λ(k)+ɛ, and λ(u) < λ(e)+ɛ.

49 4.4. CONVOLUTION AND APPROIMATION BY SMOOTH FUNCTIONS 45 Let L be a compact neighborhood of K contained in U. If f is a continuous function on R so that f 1 and f {t } 1 and f {t>1/2}, then ( g(x) := f 1 dist(x, Lc ) ) dist(k, L c ) is a continuous function with support in L and 1 on K. So χ K g χ U and hence χ K χ E g χ E χ U χ E which implies (g χ E ) + χ U χ E and (g χ E ) χ E χ K. Therefore, using (a + b) p (2 max(a, b)) p 2 p (a p + b p ) for a, b, g χ E p dx = ((g χ E ) + + (g χ E ) ) p dx 2 p ((g χ E ) + ) p + ((g χ E ) ) p dx 2 p+1 ɛ. This finishes the proof, since ɛ was arbitrary. Note that C c () is not dense in L (). If f is a bounded and continuous function on then f = sup f(x). (4.2) x Clearly, f sup x f(x). Conversely, for any ɛ > there exists a nonempty open subset U such that f(y) sup x f(x) ɛ for all y U. So the supremum of f(x) on the complement of any null set is sup x f(x) ɛ, since this complement has nonempty intersection with U. As ɛ > was arbitrary we obtain (4.2). Consequently, any limit of functions in C c () with respect to must be continuous, but there are elements in L () with no continuous representative. Let f and g be complex valued functions on R n. We formally define their convolution f g by (f g)(x) := f(x y)g(y) dy. R n One has to be careful to make sure that the definition makes sense. The integral is well-defined for all x R n, if we require that f L p (R n ) and g L q (R n ) for p, q conjugate exponents, by Hölder s inequality 4.2. But actually more is true: Theorem 4.15 (Young s inequality). Let 1 p, q, r be such that 1/p + 1/q = 1/r + 1. If f L p (R n ) and g L q (R n ), then f g L r (R n ) and f g r f p g q. (4.3) Proof. We may assume without loss of generality that f and g are Borel functions, since there exist Borel functions which coincide with f and g a.e., by Proposition 3.9. Then the mapping (x, y) f(x y)g(y) is also a Borel function, since (x, y) x y and (x, y) y are Borel. The case r = follows easily from Hölder s inequality 4.2: (f g)(x) f(x y)g(y) dy f p g q, R n where we used translation invariance of the integral.

50 46 4. L P -SPACES So assume r <. Set h(x) = (f g)(x) = R f(x y)g(y) dy; we shall see in n the course of the proof that h(x) is defined and finite for a.e. x. Set s = p(1 1/q) and let q be the conjugate exponent of q. By Hölder s inequality 4.2, h(x) f(x y)g(y) dy = f(x y) 1 s f(x y) s g(y) dy R n R ( n ) 1/q ( ) f(x y) (1 s)q g(y) q dy f(y) sq 1/q dy, R n R n where we used translation invariance of the integral. Since sq = p, we have ( ) h(x) q f(x y) (1 s)q g(y) q dy f sq p. R n Note that 1/p + 1/q = 1/r + 1 implies that r q; in fact, r = pq/(p + q pq) and p p+q pq. So t := r/q 1 and we can apply Minkowski s integral inequality 4.5: and hence h q t g q 1 f (1 s)q t f sq p which is (4.3), since (1 s)r = p. h r g q f 1 s r(1 s) f s p = g q q f (1 s)q t(1 s)q f sq p In particular, the convolution of f, g L 1 (R n ) is a function f g L 1 (R n ) satisfying f g 1 f 1 g 1, and, for f L 1 (R n ) and g L p (R n ), f g L p (R n ) with f g p f 1 g p. (4.4) Assuming that all integrals in question exist, the convolution is commutative, f g = g f, by Theorem 3.33, associative, (f g) h = f (g h), by Fubini s theorem 3.27, and satisfies supp(f g) supp f + supp g; (4.5) indeed, if x supp f + supp g then for all y supp g we have x y supp f, and hence f(x y)g(y) = for all y. We denote by L 1 loc (Rn ) the set of locally integrable functions, i.e., measurable functions f : R n C such that f(x) dx < for all bounded measurable K subsets K R n, and Cc k (R n ) is the class of k times continuously differentiable functions on R n with compact support. Lemma If ϕ C k c (R n ) and f L 1 loc (Rn ), then ϕ f C k (R n ), and α (ϕ f) = ( α ϕ) f. The lemma then follows from Theo- Proof. Clearly, ϕ f is well-defined. rem For a function f on R n and y R n we consider the translation Note that T y f p = f p, for 1 p. T y f(x) := f(x y), x R n. (4.6) Lemma For 1 p <, translation is continuous in the L p -norm, i.e., if f L p (R n ) and z R n, then lim y T y+z f T z f p =.

51 4.4. CONVOLUTION AND APPROIMATION BY SMOOTH FUNCTIONS 47 Proof. It suffices to assume that z =, since T y+z = T y T z. If g C c (R n ), then the support of T y g is contained in a fixed compact set K for all y 1, and thus R n T y g(x) g(x) p dx T y g g p λ(k), as y, since g is uniformly continuous. If f L p (R n ) and ɛ >, then there exists g C c (R n ) with g f p ɛ/3, by Theorem 4.14, and so for y sufficiently small. T y f f p T y f T y g p + T y g g p + g f p ɛ, For any function ϕ on R n and ɛ > we set ϕ ɛ (x) = ɛ n ϕ(x/ɛ), x R n. (4.7) If ϕ L 1 (R n ), then R ϕ n ɛ (x) dx is independent of ɛ, by Theorem 3.33, and, for every r > we have lim ɛ x r ϕ ɛ(x) dx =, indeed ϕ ɛ (x) dx = ɛ n ϕ(x/ɛ) dx = ϕ(x) dx. x r x r x r/ɛ Proposition Let ϕ L 1 (R n ) with R n ϕ(x) dx = a, and let 1 p <. If f L p, then f ϕ ɛ af p as ɛ. Proof. By Theorem 3.33, f ϕ ɛ (x) af(x) = (f(x y) f(x))ϕ ɛ (y) dy R n = (f(x ɛz) f(x))ϕ(z) dz R n = (T ɛz f(x) f(x))ϕ(z) dz, R n and by Minkowski s integral inequality 4.5, f ϕ ɛ af p = T ɛz f f p ϕ(z) dz. R n Now T ɛz f f p as ɛ, by Lemma 4.17, and as T ɛz f f p 2 f p, the assertion follows from the dominated convergence theorem If R n ϕ dx = 1 we say that the family {ϕ ɛ } <ɛ 1 is an approximate identity. A mollifier is a nonnegative function ϕ C c (R n ) satisfying ϕ 1 = 1. Example Consider the function { exp 1 x ψ(x) := 2 1 x < 1 x 1. Then ϕ = ( ψ dx) 1 ψ is a mollifier. Theorem 4.2 (Approximation by smooth functions). For 1 p <, Cc is dense in L p (). () Proof. Let f L p () and let δ >. We may assume that f L p (R n ) by setting f on c. By Theorem 4.14, there exists g C c (R n ) so that f g p δ/2.

52 48 4. L P -SPACES Let ϕ be a mollifier and let ϕ ɛ be defined by (4.7). By Lemma 4.16 and (4.5), g ɛ := ϕ ɛ g Cc (R n ). By Proposition 4.18, g ɛ g p δ/2 for sufficiently small ɛ. Thus, g ɛ f p g ɛ g p + g f p δ which implies the assertion. Lemma 4.21 (Smooth Urysohn lemma). If K R n is compact and U is an open set containing K, then there exists f Cc (R n ) such that f 1, f K 1, and supp f U. Proof. Let δ := dist(k, U c ), V := {x : dist(x, K) < δ/3}, and let ϕ be a mollifier with supp ϕ B δ/3 (). Then f := χ V ϕ is as required. Finally, we will show that, for 1 p <, L p () is separable, i.e., it contains a countable dense subset. Lemma If 1 p <, then the set of step functions is dense in L p (R n ). Proof. By Proposition 4.13, simple functions s so that λ({x : s(x) }) < are dense in L p (R n ). Such s are finite linear combinations of characteristic functions of sets E with λ(e) <. So it suffices to show that for given ɛ > there exists a step function f so that χ E f p ɛ. By Proposition 2.15, there exist almost disjoint cubes Q 1,..., Q m such that λ ( E m Q m i) < ɛ, and thus f = χ Q i satisfies ( χ E f p dλ λ E m ) Q i < ɛ. Theorem 4.23 (Separability). For 1 p <, L p (R n ) is separable. Proof. Let f L p (R n ) and let ɛ >. By Lemma 4.22, there is a step function s satisfying f s p ɛ/2. We may conclude that there is a step function t satisfying f t p ɛ and such that the real and imaginary parts of the coefficients and the coordinates of the boxes appearing in the canonical form of t are all rational numbers. So the set of step functions with rational real and imaginary parts of the coefficients and rational coordinates of the boxes appearing in its canonical form is dense in L p (R n ) Modes of convergence Let (, S, µ) be a measure space. A sequence f i of measurable complex valued functions on is said to be Cauchy in measure if ɛ > µ({x : f i (x) f j (x) ɛ}) as i, j, and we say that f i converges in measure to f if ɛ > µ({x : f i (x) f(x) ɛ}) as i. Proposition If f i f in L 1 (µ), then f i f in measure. The converse is not true. Proof. If E i,ɛ := {x : f i (x) f(x) ɛ}, then f i f dµ f i f dµ ɛµ(e i,ɛ ) E i,ɛ goes to as i.

53 4.5. MODES OF CONVERGENCE 49 Theorem If f i is Cauchy in measure, then there is a measurable function f such that f i f in measure, and there is a subsequence of f i that converges to f µ-a.e. If also f i g in measure, then f = g µ-a.e. Proof. The sequence f i has a subsequence h j satisfying µ({x : h j (x) h j+1 (x) 1/2 j }) 1/2 j. Set E j := {x : h j (x) h j+1 (x) 1/2 j } and F k := j=k E j. Then µ(f k ) 2 1 k. If x F k, then for all i j k, i 1 i 1 h i (x) h j (x) h l+1 (x) h l (x) 2 l 2 1 j. (4.8) l=j It follows that h j is pointwise Cauchy on (F k ) c. For F = k=1 F k, we have µ(f ) =, and we define f(x) := lim j h j (x) for x F and f(x) := for x F. Then f is measurable and h j f µ-a.e. For x F k and j k, we have h j (x) f(x) 2 1 j, by (4.8), and hence h j f in measure, since µ(f k ) as k. It follows that f i f in measure, since {x : f i (x) f(x) ɛ} {x : f i (x) h j (x) ɛ/2} {x : h j (x) f(x) ɛ/2}. If f i g in measure, then {x : f(x) g(x) ɛ} {x : f(x) f i (x) ɛ/2} {x : f i (x) g(x) ɛ/2} l=j implies f = g µ-a.e. Convergence a.e. does not imply convergence in measure. However, this implication holds on a finite measure space, actually more is true: Theorem 4.26 (Egorov s theorem). Let µ() < and let f 1, f 2,... and f be measurable complex valued functions on such that f i f µ-a.e. Then for every ɛ > there is a set E such that µ(e) < ɛ and f i f uniformly on E c. Proof. Without loss of generality assume that f i (x) f(x) for every x. For k, l N define E k,l := i k{x : f i (x) f(x) 1/l}. Clearly, E k,l E k+1,l and k=1 E k,l =, thus lim k µ(e k,l ) =, by Lemma 1.1. So, given ɛ >, we find a subsequence k l such that µ(e kl,l) < ɛ/2 l. For E = l=1 E k l,l, we have µ(e) < ɛ, and f i (x) f(x) < 1/l if i > k l and x E. It follows that f i f uniformly on E c. Let us call the type of convergence in the conclusion of Egorov s theorem almost uniform convergence. The following diagram summarizes different modes of convergence f i f of a sequence of measurable complex valued functions on a measure space (, S, µ).

54 5 4. L P -SPACES uniform convergence pointwise convergence f i f µ-a.e., f i g L 1 (µ) µ-a.e. convergence L 1 convergence convergence in measure almost uniform convergence µ-a.e. convergence for a subsequence µ() <, µ-a.e. convergence Theorem 4.27 (Lusin s theorem). Let f be a Lebesgue measurable complex valued function defined on a Lebesgue measurable set E R n with λ(e) <. Then for every ɛ > there exists a compact set K E such that λ(e \ K) ɛ and such that f K is continuous. Proof. Assume without loss of generality that f is real valued and defined on R n by setting f in E c. For each positive integer i, let {B ij } j=1 be a collection of disjoint Borel sets so that R = j=1 B ij and diam B ij < 1/i. Set E ij := E f 1 (B ij ). By regularity of λ, Theorem 2.9, there are compact sets K ij E ij satisfying λ(e ij \ K ij ) < ɛ/2 i+j. Since E = ( λ E \ ) ( K ij λ j=1 j=1 j=1 E ij, ) (E ij \ K ij ) < ɛ/2 i. By Lemma 1.1, lim k λ(e\ k j=1 K ij) = λ(e\ j=1 K ij), and so there are integers k i such that λ(e \ k i j=1 K ij) < ɛ/2 i. The sets L i := k i j=1 K ij are compact. Choose b ij B ij and define g i : L i R be setting g i Kij = b ij ; the sets K i,1,..., K i,ki are compact and disjoint, so their mutual distance is positive, and g i is continuous. As diam B ij < 1/i, we have f(x) g i (x) < 1/i for all x L i. Then the set K := L i is compact, we have λ(e \ K) λ(e \ L i ) < ɛ, and g i f uniformly on K. It follows that f K is continuous. This does not mean that f is continuous at every x K; consider e.g. χ Q [,1] The distribution function Let f : C be a measurable function on a measure space (, S, µ). The distribution function d f of f is defined by d f (α) := µ({x : f(x) > α}), α.

55 4.6. THE DISTRIBUTION FUNCTION 51 It follows from the definition that d f is decreasing. Let us set E f,α := {x : f(x) > α}. Lemma Let (, S, µ) be a measure space and let f, g : C be measurable functions. Then for all α, β > : (1) If f g µ-a.e., then d f d g. (2) d cf (α) = d f (α/ c ) for every c C \ {}. (3) d f+g (α + β) d f (α) + d g (β). (4) d fg (αβ) d f (α) + d g (β). Proof. (1) If f g µ-a.e., then d f (α) = µ(e f,α ) µ(e g,α ) = d g (α). (2) We have E cf,α = E f,α/ c. (3) & (4) If f(x) + g(x)) > α + β then f(x) > α or g(x) > β. Similarly if f(x)g(x)) > αβ. The distribution function d f does not provide information about the behavior of f near any given point. However, the L p -norm (p < ) of f can be computed if we only know d f. Proposition Let (, S, µ) be a σ-finite measure space. If f is a measurable function on and < p <, then f p dµ = p α p 1 d f (α) dα. (4.9) Proof. By Fubini s theorem 3.27, p α p 1 d f (α) dα = p = = α p 1 f(x) f(x) p dµ. χ Ef,α dµ dα pα p 1 dα dµ Remark 4.3. This result holds without the assumption of σ-finiteness; cf. [5, 6.24]. Let (, S, µ) be a measure space, and let 1 p <. The weak Lebesgue space L p, (µ) is defined as the set of all measurable functions f such that { } f p, := inf C > : d f (λ) (C/α) p for all α > (4.1) = sup α d f (α) 1/p <. α> By definition L, (µ) := L (µ). As usual two functions in L p, (µ) are considered equal if they are equal µ-a.e. By Lemma 4.28, we obtain that for each c C \ {}, and cf p, = c f p,, f + g p, 2( f p, + g p, ). Moreover, f p, = implies that f = µ-a.e. That means that L p, (µ) is a quasinormed space. One can show that it is complete.

56 52 4. L P -SPACES Proposition 4.31 (Chebyshev s inequality). Let 1 p <. If f L p (µ) then f L p, (µ) and f p, f p. (4.11) Proof. We have for all α >, f p p = f p dµ f p dµ α p µ(e f,α ) = α p d f (α). E f,α The inclusion L p (µ) L p, (µ) is strict. For example, the function f(x) = x 1/p is in L p, (R) but not in L p (R) (with the Lebesgue measure). Proposition Let (, S, µ) be a finite measure space. If 1 q < p < then L p, (µ) L q (µ) and ( p ) 1/qµ() f q 1/q 1/p f p,, f L p, (µ). (4.12) p q Proof. Let f L p, (µ). Then d f (α) min{µ(), α p f p p, }, by (4.1). Thus, for A := µ() 1/p f p,, using Proposition 4.29, f q q = q q A α q 1 d f (α) dα α q 1 µ() dα + q A α q p 1 f p p, dα = A q µ() + q p q Aq p f p p, = µ() 1 q/p f q p, + q p q µ()1 q/p f q p, = p p q µ()1 q/p f q p,. Proposition If 1 p < q < r <, then and L p, (µ) L r, (µ) L q, (µ) f q, f t p, f 1 t r,, where Proof. Since tq/p + (1 t)q/r = 1, for all α >, 1 q = t p + 1 t. r α q d f (α) = (αd f (α) 1/p ) tq (αd f (α) 1/r ) (1 t)q f tq p, f (1 t)q r,.

57 CHAPTER 5 Absolute continuity of measures 5.1. Complex measures Let (, S) be a measurable space. A complex measure is a mapping ν : S C satisfying ( ) ν E i = ν(e i ), if E i S are pairwise disjoint. Note that setting E i = for all i yields ν( ) =. A positive measure is a complex measure only if it is finite. The above series is independent of the order of its terms, i.e., it converges unconditionally and hence absolutely. Complex measures arise naturally. For instance, let µ be a positive measure on and let f L 1 (µ). Then ν(e) = f dµ is a complex measure; cf. the proof of E Corollary 3.16 and use the dominated convergence theorem For a complex measure ν one defines its total variation by { } ν (E) := sup ν(e i ) : E = E i, E i S disjoint. By definition we have ν(e) ν (E) and, if ν is a positive measure, then ν (E) = ν(e). Theorem 5.1. The total variation ν of a complex measure ν is a finite positive measure. The total variation ν is the smallest positive measure that dominates ν, i.e., if µ is a positive measure such that ν(e) µ(e) for all E S, then ν (E) µ(e) for all E S. The fact that ν is finite implies that every complex measure is bounded: ν(e) ν (E) ν (). Proof. Let E i S be disjoint and E = E i. In order to see that ν is a positive measure we need to show ν (E) = ν (E i ). (5.1) If ν (E i ) = for some i, then clearly ν (E) = ; so let us assume that ν (E i ) < for all i. Let ɛ >. For each i, there are disjoint E ij S so that E i = j=1 E ij and ν (E i ) j=1 ν(e ij) + ɛ/2 i. Then ν (E i ) ν(e ij ) + ɛ ν (E) + ɛ, i,j=1 since E = i,j=1 E ij is a disjoint union. This implies ν (E i) ν (E). 53

58 54 5. ABSOLUTE CONTINUITY OF MEASURES Conversely, if F j S are disjoint and E = j=1 F j, then ν(f j ) = ν(f j E i ) j=1 = j=1 j=1 j=1 ν(f j E i ) ν(f j E i ) ν (E i ), and taking the supremum over all such partitions {F j } we may conclude that ν (E) ν (E i). Thus, we proved (5.1) and ν is a positive measure. It remains to show that ν () <. Since ν (E) Re ν (E) + Im ν (E), we may assume that ν is real valued. That ν () < will follow from the claim that, if E S and ν (E) =, then E = A B with disjoint A, B S and ν(a) 1 and ν (B) =. Indeed, this assertion can be applied recursively (starting with E = ) to obtain disjoint sets A 1, A 2,... S with ν(a i ) 1 for all i. This leads to a contradiction, since ν( A i) = ν(a i), but this series cannot converge. Let us prove the claim. Suppose that ν (E) =. Then there exist disjoint sets E i S with E = E i so that ν(e i ) 2 + ν(e). Set E + := ν(e E i) i and E := ν(e E i)< i. Then the previous inequality becomes ν(e + ) + ν(e ) 2 + ν(e + ) ν(e ) and thus ν(e ± ) 1. Since E = E + E and so ν (E) = ν (E + ) + ν (E ), ν (E + ) = or ν (E ) = (or both). A real measure ν : S R (often called a signed measure) can be decomposed into positive and negative variations, ν = ν + ν where ν ± := ν ± ν. 2 By Theorem 5.1, ν ± are finite positive measures. This is known as the Jordan decomposition. If ν = ν 1 ν 2 is any other decomposition into positive measures, then ν 1 ν + and ν 2 ν ; see the remarks after Theorem 5.7. If ν is a real measure and f is ν -integrable, then the integral of f with respect to ν is defined by f dν := f dν + f dν. This definition can evidently be extended to any complex measure ν by applying it to the real and imaginary part of ν. One can show that the set of all complex measures on a measurable space equipped with the norm ν = ν () forms a Banach space.

59 5.2. ABSOLUTE CONTINUITY AND DECOMPOSITION OF MEASURES Absolute continuity and decomposition of measures Let (, S) be a measurable space, and let µ be a positive measure on S. In the following we assume that ν, ν 1, ν 2, etc., are further either positive or complex measures on S. We say that ν is absolutely continuous with respect to µ, and write ν µ, if, for each E S, µ(e) = implies ν(e) =. For instance, the measure ν(e) = E f dµ, where f L1 (µ), satisfies ν µ; we shall see below that every measure absolutely continuous with respect to µ is of this form. Two measures ν 1 and ν 2 on S are called mutually singular, and we write ν 1 ν 2, if they are supported on disjoint sets, i.e., there exist disjoint E 1, E 2 S such that ν i (E) = if E E i =, i = 1, 2. For instance, the Lebesgue measure and the Dirac measure on R n are mutually singular. Lemma 5.2. (1) If ν i µ, i = 1, 2, then ν 1 + ν 2 µ. (2) If ν i µ, i = 1, 2, then ν 1 + ν 2 µ. (3) If ν 1 µ and ν 2 µ, then ν 1 ν 2. (4) If ν µ and ν µ, then ν =. (5) If ν µ, then ν µ. Proof. (1) is obvious. (2) There exist E 1, E 2, E S such that E i E = and ν i is supported on E i, i = 1, 2, and µ is supported on E. Then ν 1 + ν 2 is supported on E 1 E 2 and (E 1 E 2 ) E =. (3) There exists E 2 S so that ν 2 is supported on E 2 and µ(e 2 ) =. Since ν 1 µ, ν 1 (E 2 ) = and hence ν 1 has support in E2. c (4) By (3), ν ν and hence ν =. (5) Suppose that µ(e) = and let E = E i for disjoint E i S. Then µ(e i ) = for all i. Since ν µ we have ν(e i ) = for all i, and thus i ν(e i) =. This implies ν (E) =. Theorem 5.3 (Lebesgue Radon Nikodym theorem). Let µ and ν be positive finite measures on a measurable space (, S). Then we have (1) There is a unique pair of positive measures ν a and ν s on S such that ν = ν a + ν s, ν a µ, ν s µ, ν a ν s. (2) There is a unique f L 1 (µ) such that ν a (E) = f dµ, E S. E The decomposition ν = ν a + ν s is called the Lebesgue decomposition of ν with respect to µ. Part (2) is known as the Radon Nikodym theorem. The function f in (2) is called the Radon Nikodym derivative of ν a with respect to µ; one writes dν a = f dµ or f = dν a / dµ. Proof. To see uniqueness in (1) let ν a and ν s be another pair satisfying (1). Then ν a ν a = ν s ν s, ν a ν a µ, and ν s ν s µ, and thus ν a ν a = ν s ν s =, by Lemma 5.2. Uniqueness in (2) follows from Proposition Set ϕ = ν + µ. Then ϕ is a positive finite measure on S, and we have f dϕ = f dν + f dµ

60 56 5. ABSOLUTE CONTINUITY OF MEASURES which is obvious for characteristic functions of sets in S, hence for simple functions, and thus also for arbitrary measurable functions. If f L 2 (ϕ), then f dν f dν f dϕ ϕ() 1/2 f L 2 (ϕ), by Hölder s inequality 4.2. We may infer that f f dν is a bounded linear functional on L 2 (ϕ). By Corollary 4.12 and Theorem A.8, there exists g L 2 (ϕ) such that, for all f L 2 (ϕ), f dν = fg dϕ. In particular, for all E S, ν(e) = χ E g dϕ = It follows that g(x) for ϕ-a.e. x, and since µ(e) = ϕ(e) ν(e) = E E g dϕ. (1 g) dϕ, we also have g(x) 1 for ϕ-a.e. x. Without loss of generality we may assume that g(x) 1 for all x. We obtain, for f L 2 (ϕ), (1 g)f dν = f dν fg dν = fg dϕ fg dν = fg dµ. (5.2) Set A := {x : g(x) < 1} and B := {x : g(x) = 1}, and define ν a (E) := ν(a E), ν s (E) := ν(b E), E S. Taking f = χ B in (5.2) we find = (1 g) dν = g dµ = µ(b), and hence B B ν s µ. Since g is bounded and ϕ is finite, f = (1 + g + g g k )χ E L 2 (ϕ), for E S, and inserting f in (5.2) gives (1 g k+1 ) dν = g(1 + g + g g k ) dµ. E E For x B, 1 g k+1 (x) =, and for x A, g k+1 (x) as k, and therefore the left side converges to ν a (E), by the monotone convergence theorem The integrand of the right side converges monotonically to a positive measurable function h, and, by the monotone convergence theorem 3.14, we find that, for E S, ν a (E) = h dµ. For E = we see that h L 1 (µ), since ν a () <. So we have proved (2). In particular, ν a µ which completes the proof of (1). Corollary 5.4 (Lebesgue Radon Nikodym theorem). We have the following extensions: E (1) Theorem 5.3 remains true if µ is a positive σ-finite measure and ν is a complex measure (where ν a and ν s now are complex measures). (2) If µ and ν are positive σ-finite measures, then Theorem 5.3 still holds with the restriction that the function f is no longer in L 1 (µ). Proof. If µ is σ-finite, then i = for disjoint i S with µ( i ) <. (1) Suppose first that ν is positive with ν() <. Then we may apply Theorem 5.3 to each i. The Lebesgue decompositions of the restrictions of ν to i add up to a Lebesgue decomposition of ν. i with respect to the restriction of µ to i. We obtain L 1 -functions f i on Then f := f iχ i satisfies

61 5.2. ABSOLUTE CONTINUITY AND DECOMPOSITION OF MEASURES 57 ν a (E) = E f dµ and is L1 (µ), since ν() <. If ν is complex valued, we apply this to positive and negative variations of the real and imaginary part of ν. (2) This follows in the same way as (1); we can assume that also ν( i ) <. The function f satisfies i f dµ < for each i. The result fails if we go beyond σ-finiteness. For example, on = R consider the σ-algebra L(R) of Lebesgue measurable sets and let µ be the counting measure and ν = λ the Lebesgue measure on L(R). Then ν µ, but there is no function f satisfying dν = f dµ. If there were such f, then f(x ) > for some x R and < f(x ) = {x } f dµ = ν({x }) =. Proposition 5.5 (Characterization of absolute continuity). Let µ and ν be measures on a measurable space (, S), µ positive and ν complex. Then the following are equivalent: (1) ν µ. (2) For each ɛ > there is a δ > so that ν(e) < ɛ for all E S with µ(e) < δ. Proof. Clearly, (2) implies (1). Assume that (2) does not hold. Then there is ɛ > and there are E i S so that µ(e i ) < 2 i and ν(e i ) ɛ. Let us set F k := i=k E i and F = k=1 F k. Then µ(f k ) 2 k+1 and µ(f ) = lim k µ(f k ) =, by Lemma 1.1. Similarly, ν (F ) = lim k ν (F k ) ɛ >. Thus we do not have ν µ, and hence (1) does not hold, by Lemma 5.2. Theorem 5.6 (Polar decomposition). Let ν be a complex measure on a measurable space (, S). Then there exists a measurable function f on satisfying f(x) = 1 for all x, and such that dν = f d ν. Proof. The Radon Nikodym theorem 5.3 implies that there is a function f L 1 ( ν ) so that dν = f d ν. Let us show that f(x) = 1 for all x. Set E a := {x : f(x) < a} and let E a = E ai be a partition of E a. Then ν(e ai ) = f d ν a ν (E ai ) = a ν (E a ), E ai and hence ν (E a ) a ν (E a ). This implies that ν (E a ) = if a < 1, and therefore f 1 ν -a.e. On the other hand, whenever ν (E) >, 1 f d ν = ν(e) ν (E) ν (E) 1. E We will show that this implies that f 1 ν -a.e. Take an open disk B r (c) in the complement of the closed unit disk B 1 () in C. It suffices to show that E := f 1 (B r (c)) is a ν -null set, since B 1 () c is a countable union of such disks. If ν (E) > then 1 1 f d ν c = (f c) d ν r, ν (E) ν (E) E a contradiction. By redefining f on the set {x : f(x) 1}, the statement follows. E

62 58 5. ABSOLUTE CONTINUITY OF MEASURES Theorem 5.7 (Hahn decomposition). Let ν be a signed measure on a measurable space (, S). Then there exist disjoint sets P, N S such that = P N and ν + (E) = ν(p E) and ν (E) = ν(n E), E S. Proof. By Theorem 5.6, dν = f d ν for a measurable function f with f = 1. Since ν is real valued, so is f; this is true a.e. and everywhere after redefining f. Thus f() = {±1}. Set P := {x : f(x) = 1} and N := {x : f(x) = 1}. Note that 1 + f(x) 2 = { f(x) and since ν + = ( ν + ν)/2, we have for E S, ν + (E) = 1 (1 + f) d ν = 2 E x P x N, P E f d ν = ν(p E). That ν (E) = ν(n E) follows from ν = ν + ν and from ν(e) = ν(p E) + ν(n E). As a corollary we obtain that the Jordan decomposition is minimal in the following sense: if ν = ν 1 ν 2 for positive measures ν 1 and ν 2 then ν 1 ν + and ν 2 ν. In fact, as ν ν 1 we have ν + (E) = ν(p E) ν 1 (P E) ν 1 (E).

63 CHAPTER 6 Differentiation and integration 6.1. The Lebesgue differentiation theorem Recall that L 1 loc (Rn ) is the set of measurable functions f : R n C such that K f(x) dx < for all bounded measurable subsets K Rn. For f L 1 loc (Rn ), x R n, and r > we consider the average A r f(x) of f over the open ball B r (x), 1 A r f(x) := f(y) dy = f(y) dy. λ(b r (x)) B r(x) B r(x) We shall use the notation ffl f dx = λ(e) 1 f dx whenever E is bounded and E E measurable, λ(e) >, and f L 1 loc (Rn ). Lemma 6.1. The mapping (, ) R n (r, x) A r f(x) C is continuous. Proof. The functions χ Br(x) converge pointwise to χ Br (x ) on the set R n \{x : x x = r } as (r, x) tends to (r, x ). Thus, χ Br(x) χ Br (x ) λ-a.e. on R n. Moreover, χ Br(x) χ Br +1(x ) if r < r +1/2 and x x < 1/2. By the dominated convergence theorem 3.22, we have f(y) dy f(y) dy, B r(x) B r (x ) and since λ(b r (x)) = λ(b 1 ())r n λ(b 1 ())r n = λ(b r (x )), the statement follows. For f L 1 loc (Rn ) we may define the Hardy Littlewood maximal function Mf by Mf(x) := sup A r f (x) = sup f(y) dy. r> r> B r(x) Then Mf is measurable, since (Mf) 1 ((a, )) = r> (A r f ) 1 ((a, )) is open, by Lemma 6.1. Lemma 6.2. Let C be a collection of open balls in R n, and U = C. If c < λ(u), then there are finitely many disjoint B 1,..., B k C so that k j=1 λ(b j) > 3 n c. Proof. By Theorem 2.9, there is a compact set K U with λ(k) > c. The set K is covered by finitely many balls A 1,..., A l C. Let B 1 be one of the balls A i with maximal radius. Let B 2 be a ball of maximal radius among the balls A i disjoint from B 1. Let B 3 be a ball of maximal radius among the balls A i disjoint from B 1 and B 2, etc., until the collection of A i is exhausted. If A i {B 1,..., B k } then A i B j for some j, and if j is the smallest integer with that property, then the radius of A i is at most that of B j. Consequently, A i B j, where B j is the open ball concentric with B j whose radius is three times that of B j. Then 59

64 6 6. DIFFERENTIATION AND INTEGRATION B 1,..., B k cover K and so c < λ(k) k k λ(bj ) = 3 n λ(b j ). j=1 j=1 Theorem 6.3 (M is weak type (1, 1)). For each f L 1 (R n ) and each a >, we have λ({x : Mf(x) > a}) C f(x) dx, a R n where C is a constant depending only on n. Proof. Set E a := {x : Mf(x) > a} and let x E a. Then there exists r x > so that A rx f (x) > a. The collection of balls {B rx (x)} x Ea covers E a, and by Lemma 6.2, given c < λ(e a ) there exist x 1,..., x k E a so that the balls B j = B rxj (x j ) are disjoint and k j=1 λ(b j) > 3 n c. Thus, c < 3 n k λ(b j ) 3n a j=1 Letting c λ(e a ) yields the result. k j=1 B j f(x) dx 3n a R n f(x) dx. A sublinear mapping T (i.e. T (f + g) T f + T g and T (cf) = c T f for c > ) is called weak type (p, q) for 1 p and 1 q < if T maps L p (µ) into L q, (µ) and T f q, C f p for all f L p (µ). Theorem 6.3 means that the Hardy Littlewood maximal operator M satisfies Mf 1, C f 1 for f L 1 (R n ), so it is weak type (1, 1); see also Corollary Proposition 6.4. If f L 1 loc (Rn ) then lim r A r f(x) = f(x) for λ-a.e. x R n, i.e., lim r (f(y) f(x)) dy = for λ-a.e. x R n. (6.1) Br(x) Proof. It suffices to show that, for each N N, we have lim r A r f(x) = f(x) for λ-a.e. x B N (). As, for x B N () and r 1, the values of A r f(x) depend only on the values of f(y) for y B N+1 (), we may replace f by χ BN+1 ()f and hence assume that f L 1 (R n ). Let ɛ >. By Theorem 4.14, there is a continuous function g with f g 1 ɛ. By continuity of g, for each x R n, A r g(x) g(x) as r. Now g(y) g(x) dy sup g(y) g(x) B r(x) y B r(x) A r f(x) f(x) A r f g (x) + A r g(x) g(x) + g(x) f(x), and taking lim sup r = lim ɛ sup <r<ɛ on both sides we find This implies that satisfies lim sup A r f(x) f(x) M(f g)(x) + g(x) f(x). r E a := {x : lim sup A r f(x) f(x) > a} r E a {x : M(f g)(x) > a/2} {x : g(x) f(x) > a/2}.

65 6.1. THE LEBESGUE DIFFERENTIATION THEOREM 61 It follows from Theorem 6.3 and Chebyshev s inequality 4.31 that 2(C + 1) 2(C + 1) λ(e a ) f(x) g(x) dx ɛ. a R a n As ɛ > was arbitrary, λ(e a ) =. Since lim r A r f(x) = f(x) if and only if lim sup r A r f(x) f(x) =, we have lim r A r f(x) = f(x) if x k=1 E 1/k. This implies the assertion. We will show in the next theorem that (6.1) remains true if we replace the integrand by its absolute value. A Lebesgue point of a function f L 1 loc (Rn ) is a point x R n so that lim r f(y) f(x) dy =. Br(x) Let L f denote the set of all Lebesgue points of f. Theorem 6.5. If f L 1 loc (Rn ) then λ((l f ) c ) =. Proof. Let c C. Applying (6.1) to x f(x) c shows that lim r f(y) c dy = f(x) c Br(x) except on a null set E c. Let D be a countable dense subset of C. Then E = c D E c is a null set. Assume x E. For each ɛ > there is c D so that f(x) c < ɛ, and thus lim sup r f(y) f(x) dy lim sup B r(x) r Since ɛ was arbitrary, the proof is complete. f(y) c dy + ɛ = f(x) c + ɛ < 2ɛ. B r(x) We shall now establish Theorem 6.5 for families of sets more general than {B r (x)} r. A family of Borel sets {E r } r> is said to shrink nicely to x if E r B r (x) for all r >, there is a > so that λ(e r ) > aλ(b r (x)) for all r >. The sets E r need not contain x. Theorem 6.6 (Lebesgue differentiation theorem). Let f L 1 loc (Rn ). Then, for each x L f and each family {E r } r> that shrinks nicely to x, lim f(y) f(x) dy = and lim r Er r Er f(y) dy = f(x). Proof. Since {E r } r> shrinks nicely to x, 1 1 f(y) f(x) dy λ(e r ) E r aλ(b r (x)) B r(x) f(y) f(x) dy as r, by Theorem 6.5. The second equality may be written in the form and thus is a consequence of the first. lim (f(y) f(x)) dy = r Er Corollary 6.7 (Antiderivatives). If f L 1 (R) and F (x) = x f(t) dt, x R, then F (x) = f(x) on every Lebesgue point of f.

66 62 6. DIFFERENTIATION AND INTEGRATION Proof. For E r = [x, x + r), Theorem 6.6 shows that, for x L f, F (x + r) F (x) 1 lim = lim r r r r x+r x f(y) dy = lim r so the right derivative of F at x exists and equals f(x). derivative Derivatives of measures Er f(y) dy = f(x), Similarly for the left The Radon Nikodym theorem provides an abstract notion of derivative of a complex measure with respect to a positive measure. On the measurable space (R n, B(R n )) we can define a pointwise derivative of a complex measure with respect to Lebesgue measure which coincides λ-a.e. with the Radon Nikodym derivative. Theorem 6.8. Let µ be a complex Borel measure on R n with Lebesgue decomposition dµ = dν + f dλ. Then for λ-a.e. x R n, µ(e r ) lim r λ(e r ) = f(x), for every family {E r } r> that shrinks nicely to x. Proof. By the Radon Nikodym theorem 5.3, f L 1 (R n ). So, by Theorem 6.6, it suffices to show that, for λ-a.e. x R n, ν(e r ) lim r λ(e r ) = for every family {E r } r> that shrinks nicely to x. We may assume without loss of generality that ν is positive and E r = B r (x), thanks to ν(e r) ν (E r) λ(e r ) λ(e r ) ν (B r(x)) aλ(b r (x)). Let A be a Borel set such that ν(a) = λ(a c ) =, and set { ν(b r (x)) F k := x A : lim sup r λ(b r (x)) > 1 }. k To complete the proof it is enough to show that λ(f k ) = for all k. Since ν is finite (because µ is finite), ν is regular, by Theorem 2.7. Hence, for given ɛ > there is an open set U A so that ν(u) < ɛ. By definition of F k, if x F k then there is a ball B x := B rx (x) U such that ν(b x ) > k 1 λ(b x ). Set V := x F k B x and choose c < λ(v ). By Lemma 6.2, there exist x 1,..., x j so that B x1,..., B xj are disjoint and c < 3 n j λ(b xi ) 3 n k j ν(b xi ) 3 n kν(v ) 3 n kν(u) < 3 n kɛ. Letting c λ(v ) we may conclude that λ(f k ) =. For a complex Borel measure µ on R n we call (Dµ)(x) = lim r µ(b r (x)) λ(b r (x)) the derivative of µ at x R n, provided that the limit exists. Theorem 6.8 tells us that the derivative of a complex Borel measure exists Leb-a.e. and equals the Radon Nikodym derivative of the absolutely continuous part of µ with respect to λ.

67 6.3. THE FUNDAMENTAL THEOREM OF CALCULUS The fundamental theorem of calculus A function f : [a, b] C, a, b R, is said to be absolutely continuous on [a, b], we write f AC([a, b]), if for each ɛ > there is a δ > so that for any n N and any disjoint collection of subintervals (a i, b i ) [a, b] n n (b i a i ) < δ = f(b i ) f(a i ) < ɛ. (6.2) Obviously, f AC([a, b]) is uniformly continuous on [a, b]. Note that AC([a, b]) forms a vector space. Lemma 6.9. Let I = [a, b] and let f L 1 (I). Then is absolutely continuous on I. F (x) := F (a) + x a f(t) dt, x I, Proof. Let µ be the measure on I defined by dµ = f dλ. Since µ λ and hence µ λ by Lemma 5.2, for each ɛ > there is a δ > so that µ (E) < ɛ if λ(e) < δ, by Proposition 5.5. It follows that F is absolutely continuous on I, as F (y) F (x) = µ((x, y)) for a x < y b. Proposition 6.1. For a continuous nondecreasing function f : I = [a, b] R the following are equivalent: (1) f AC(I). (2) f maps sets of measure zero to sets of measure zero. (3) f is differentiable a.e. on I, f L 1 (I), and f(x) f(a) = x a f (t) dt, x I. Property (2) is called the Lusin (N)-property. Proof. (1) (2) Let E I be measurable and λ(e) =. Without loss of generality assume that E (a, b). Let ɛ >. Then there is δ > such that (6.2) holds. There exists an open set V with E V I and λ(v ) < δ, by Theorem 2.9. Let (a i, b i ) denote the connected components of V. Then λ(v ) = (b i a i ) < δ and thus (f(b i ) f(a i )) < ɛ, by (6.2), where we first consider partial sums and then proceed to the limit. Since f(e) [f(a i ), f(b i )] and the latter is a Borel set of measure bounded by ɛ, we we may conclude that λ(f(e)) = (as λ is complete). (2) (3) We define g(x) := x + f(x), x I. Then g has the Lusin (N)-property, since, if f maps an interval J of length l to an interval of length l, then g(j) is an interval of length l + l. We claim that g maps measurable sets E I to measurable sets. Indeed, by Corollary 2.1, E = E E 1 where λ(e ) = and E 1 is a F σ -set. In particular, E 1 is a countable union of compact sets and, as g is continuous, so is g(e 1 ). Since g has the Lusin (N)-property, λ(g(e )) = and we may conclude that g(e) = g(e ) g(e 1 ) is measurable. We define µ(e) := λ(g(e)), E I measurable. Then µ is a positive bounded measure on the Lebesgue measurable sets E I, since g is injective and so σ-additivity of λ transfers to µ. Moreover, µ λ, since g has the Lusin (N)-property. By the Radon Nikodym theorem 5.3, there exists h

68 64 6. DIFFERENTIATION AND INTEGRATION L 1 (I) such that dµ = h dλ. Consequently, for E = [a, x] we find g(e) = [g(a), g(x)] and x g(x) g(a) = λ(g(e)) = µ(e) = h dλ = h(t) dt, which gives f(x) f(a) = x a E (h(t) 1) dt, x I. By Corollary 6.7, f = h 1 a.e., and (3) is shown. (3) (1) follows from Lemma 6.9. To any function f : I = [a, b] C we associate the total variation function { n } T f (x) := sup f(x i ) f(x i 1 ) : n N, a = x < < x n = x, x I. In general T f (x) T f (y) if x < y. We say that f is of bounded variation, and write f BV (I), if T f (b) < ; Va b (f) = T f (b) is called the total variation of f. Proposition An absolutely continuous function f : I = [a, b] R has bounded variation. The functions T f, T f + f, and T f f are nondecreasing and absolutely continuous on I. Proof. For ɛ = 1 there is a δ > such that (6.2) holds. Set n := 2(b a)/δ and divide [a, b] into n intervals [x i 1, x i ] of equal length (b a)/n. Since (b a)/n < δ, (6.2) implies that Vx xi i 1 (f) 1 and therefore V b a (f) = n whence f has bounded variation on I. If a = x < < x n = x < y b then T f (y) f(y) f(x) + V xi x i 1 (f) n <, n f(x i ) f(x i 1 ) and hence T f (y) f(y) f(x) + T f (x) and, in particular, T f (y) f(y) f(x) + T f (x) and T f (y) f(x) f(y) + T f (x). Thus T f, T f + f, and T f f are nondecreasing. It remains to show that T f is absolutely continuous on I. For a x < y b, { n } T f (y) T f (x) = sup f(x i ) f(x i 1 ) : n N, x = x < < x n = y. a (6.3) For ɛ > there is a δ > such that (6.2) holds. Let (a j, b j ) be disjoint subintervals of I so that N j=1 (b j a j ) < δ. Applying (6.3) to each (a j, b j ), we get N (T f (b j ) T f (a j )) ɛ, j=1 by (6.2). Thus T f is absolutely continuous on I.

69 6.4. RADEMACHER S THEOREM 65 Example 6.12 (Cantor function). The Cantor function f from Example 3.5 is not absolutely continuous. In fact, f(c) = [, 1] and so the Lusin (N)-property fails. f is differentiable a.e., f = on [, 1] \ C, but 1 = f(1) f() 1 f (t) dt =. However, f has bounded variation with V 1 (f) = 1. Theorem 6.13 (Fundamental theorem of calculus). For a function f : I = [a, b] C the following are equivalent: (1) f AC(I). (2) f(x) = f(a) + x a g(t) dt for some g L1 (I). (3) f is differentiable a.e. in I, f L 1 (I), and f(x) = f(a) + x a f (t) dt. Proof. (2) (1) is Lemma 6.9 and (3) (2) is trivial. (1) (3) Without loss of generality assume that f is real valued. Write f = T f + f 2 T f f. 2 By Proposition 6.11, the functions f ± := (T f ± f)/2 are nondecreasing and absolutely continuous, and by Proposition 6.1, f ± satisfy (3). It follows that f = f + f satisfies (3). Corollary 6.14 (Integration by parts). If f, g AC([a, b]) then fg AC([a, b]), and b a f (x)g(x) dx = f(b)g(b) f(a)g(a) b a f(x)g (x) dx. Proof. Let ɛ >. Then there is δ > so that for any finite disjoint collection of subintervals (a i, b i ) [a, b] with n (b i a i ) < δ we have n n f(b i ) f(a i ) < ɛ and g(b i ) g(a i ) < ɛ. Let C := max{ f, g }. Then and thus f(b i )g(b i ) f(a i )g(a i ) f(b i ) g(b i ) g(a i ) + g(a i ) f(b i ) f(a i ) n f(b i )g(b i ) f(a i )g(a i ) 2Cɛ. Hence fg AC([a, b]). By Theorem 6.13, f(b)g(b) f(a)g(a) = b a (fg) (x) dx and, as f, g, and fg are differentiable a.e., the desired formula follows from the product rule Rademacher s theorem Let A R n. Recall that a mapping f : A R m is said to be Lipschitz if f(x) f(y) Lip(f) := sup <. x,y A x y x y We say that f is locally Lipschitz if the restriction f K to every compact subset K A is Lipschitz.

70 66 6. DIFFERENTIATION AND INTEGRATION Theorem 6.15 (Lipschitz extensions). Let A R n and let f : A R m be Lipschitz. Then there exists a Lipschitz extension f : R n R m of f with Lip( f) m Lip(f). Proof. If m = 1 we may define ( ) f(x) := inf f(a) + Lip(f) x a. a A Indeed, if x A then for all a A, f(x) f(x) f(a) + Lip(f) x a and thus f(x) = f(x). For x, y R n, ( ) f(x) inf f(a) + Lip(f)( y a + x y ) = f(y) + Lip(f) x y, a A and symmetrically f(y) f(x) + Lip(f) x y. If f = (f 1,..., f m ) : A R m, then f := ( f 1,..., f m ) is as required, since f(x) f(y) m 2 = f i (x) f i (y) 2 m Lip(f) 2 x y 2. Actually, by Kirszbraun s theorem there is an extension f with Lip( f) = Lip(f); cf. [4]. We shall now prove Rademacher s theorem that a Lipschitz function f : R n R m is differentiable a.e. That is at a.e. x R n there exists a linear mapping T : R n R m such that f(y) f(x) T (x y) lim =. y x x y If such a linear mapping exists, it is obviously unique. We denote it by df(x) and call it the derivative of f at x. Theorem 6.16 (Rademacher). Let f : R n R m be locally Lipschitz. Then f is differentiable a.e. Proof. We may assume without loss of generality that m = 1 and that f is Lipschitz, by Theorem 6.15, since differentiability is a local property. For v R n with v = 1, we consider the directional derivative of f at x, d v f(x) := lim t f(x + tv) f(x) t provided this limit exists. We claim that d v f(x) exists for a.e. x R n. We work with the Dini derivatives d v f(x) and d v f(x). Since f is continuous, d v f(x) := lim sup t f(x + tv) f(x) t = lim k sup < t <1/k t Q is Borel measurable, by Theorem 3.4; the same holds for Consequently, the set d v f(x) := lim inf t f(x + tv) f(x). t f(x + tv) f(x) t E v := {x R n : d v f(x) fails to exist} = {x R n : d v f(x) < d v f(x)} is a Borel set; note that d v f(x), d v f(x) R since f is Lipschitz. For fixed x, v R n with v = 1 then function R t f(x + tv) is Lipschitz, hence absolutely

71 6.4. RADEMACHER S THEOREM 67 continuous, and thus differentiable at a.e. t R, by Theorem So H 1 (E v L) = for each line L whose direction is v. By Fubini s theorem 3.27, E v is a null set. If we take the standard unit vectors in R n for v, we may conclude that the gradient f(x) := ( 1 f(x),..., n f(x)) exists for a.e. x R n. We next claim that d v f(x) = f(x) v for a.e. x R n. Let ϕ Cc (R n ). We have ( f(x + tv) f(x) ) ( ϕ(x) ϕ(x tv) ) ϕ(x) dx = f(x) dx. t R t n R n As f(x+v/k) f(x) 1/k Lip(f), the dominated convergence theorem 3.22 yields d v f(x)ϕ(x) dx = f(x)d v ϕ(x) dx R n R n = f(x) ϕ(x) v dx R n n = f(x) i ϕ(x)v i dx R n n = i f(x)ϕ(x)v i dx R n = ϕ(x) f(x) v dx, R n where we used Fubini s theorem 3.27, the absolute continuity of f on lines, and Corollary Since the equality holds for every ϕ Cc (R n ), we have d v f(x) = f(x) v for a.e. x R n ; cf. Proposition Choose a countable dense subset {v 1, v 2,...} of S n 1. Set E k := {x R n : d vk f(x) and f(x) exist and satisfy d vk f(x) = f(x) v k } and E := k=1 E k. Then λ(e c ) =. Let us show that f is differentiable at every x E. Fix x E. For v S n 1 and t R \ {} consider Then, for w S n 1, Q(x, v, t) := f(x + tv) f(x) t f(x) v. f(x + tv) f(x + tw) Q(x, v, t) Q(x, w, t) + f(x) (v w) t Lip(f) v w + f(x) v w ( n + 1) Lip(f) v w. (6.4) Fix ɛ > and choose an integer N sufficiently large such that if v S n 1 then ɛ v v k 2( (6.5) n + 1) Lip(f) for some k {1,..., N}. Since Q(x, v k, t) as t, there exists δ > such that Q(x, v k, t) < ɛ/2 for < t < δ, k = 1,..., N. (6.6) By (6.4), (6.5), and (6.6), for each v S n 1 there exists k {1,..., N} such that Q(x, v, t) Q(x, v k, t) + Q(x, v, t) Q(x, v k, t) < ɛ

72 68 6. DIFFERENTIATION AND INTEGRATION if < t < δ; the same δ works for all v S n 1. Let y R n, y x. Then y = x + tv with v = (y x)/ y x and t = x y, and therefore f(y) f(x) f(x) (y x) y x as y x. So f is differentiable at x with df(x) = f(x). = Q ( x, y x y x, x y )

73 CHAPTER 7 The dual of L p 7.1. The dual of L p Let (, S, µ) be a measure space, and let 1 p. A linear functional on L p (µ) is a linear mapping l : L p (µ) C. A linear functional l on L p (µ) is continuous if f k f p implies l(f k ) l(f), or equivalently, l(f) C f p, f L p (µ), for some constant C >, i.e., l is bounded. This equivalence holds on any normed space; see Lemma A.1. To see it directly, assume that f k L p (µ) so that ( fk ) l(f k ) l =. f k p f k p Then g k := f k / f k p L p (µ) satisfies g k p 1 and g k, l(g k ) p whereas ( l gk ) = 1. l(g k ) The dual of L p (µ) is the set of all continuous linear functionals on L p (µ); it is denoted by L p (µ). The space L p (µ) is a vector space and carries a natural norm, the operator norm, l := sup{ l(f) : f p 1} = inf{c : l(f) C f p for all f L p (µ)}. Let q be the conjugate exponent of p. Hölder s inequality 4.2 implies that a function g L q (µ) defines a continuous linear functional l g on L p (µ) via l g (f) := gf dµ. (7.1) We shall see that every continuous linear functional on L p (µ) has the form (7.1), if 1 < p < and if p = 1 provided that µ is σ-finite. We will use the following result (compare with Proposition 4.4). Proposition 7.1. Let 1 p, q be conjugate exponents. Suppose that g : C is measurable and such that fg L 1 (µ) for all f S := {simple f : µ({x : f(x) }) < }, the quantity M q (g) := sup{ fg dµ : f S, f p = 1} is finite, {x : g(x) } is σ-finite. Then g L q (µ) and M q (g) = g q. Proof. We claim that a bounded measurable function f with f p = 1 that vanishes outside a set F of finite measure satisfies fg dµ M q (g). By Corollary 3.7, there are simple functions s i converging pointwise to f and satisfying s i f. 69

74 7 7. THE DUAL OF L P Since s i f χ F and χ F g L 1 (µ), we have fg dµ = lim i s i g dµ M q (g), by the dominated convergence theorem Suppose that q <. By assumption, E := {x : g(x) } = E i where E i E i+1 and µ(e i ) <. By Corollary 3.7, there are simple functions s i converging pointwise to g and satisfying s i g. If we set g i := s i χ Ei, then g i converge pointwise to g, satisfy g i g, and g i vanishes outside of E i. Define { g i q 1 q g i (x) q 1 g(x) 1 g(x) g(x) f i (x) := g(x) =. Then f i p = 1 and by Fatou s lemma 3.17, g q lim inf i q = lim inf f i g i dµ i i lim inf i f i g dµ = lim inf i f i g dµ M q (g), by the first paragraph. Thus, M q (g) = g q by Hölder s inequality 4.2. Assume that q =. For ɛ > set A := {x : g(x) M (g) + ɛ}. If µ(a) > there is a subset B A with < µ(b) <, since {x : g(x) } is σ-finite. Set f(x) := µ(b) 1 χ B (x)g(x)/ g(x) if g(x) and f(x) := otherwise. Then f 1 = 1 and fg dµ = µ(b) 1 B g dµ M (g) + ɛ which contradicts the first paragraph. Thus g = M (g). Theorem 7.2 (Dual of L p ). Let 1 p, q be conjugate exponents. For 1 < p <, the mapping L q (µ) g l g L p (µ), where l g (f) = gf dµ, is an isometric isomorphism. The same is true for p = 1 provided that µ is σ-finite. For p = it is isometric but not surjective. So in all cases l g = g q. (7.2) Proof. Hölder s inequality 4.2 implies that l g L p (µ) if g L q (µ). That l g = g q follows from Proposition 4.4. Let us show surjectivity for 1 p <. Let l L p (µ). Assume first that µ() <. Then, for each E S, χ E L p (µ), and ν(e) := l(χ E ), E S, defines a complex measure. Indeed, if E i S are pairwise disjoint, then k p χ Ei χ Ei, by the dominated convergence theorem 3.22, and hence, by continuity of l, ( ) ( ) ν E i = l χ Ei = l(χ Ei ) = ν(e i ). If µ(e) =, then χ E = in L p (µ), and thus ν(e) =, i.e., ν µ. By the Radon Nikodym theorem 5.3, there exists g L 1 (µ) so that l(χ E ) = ν(e) = g dµ = χ E g dµ, E S. E We may conclude that l(f) = fg dµ holds for each simple function f and that fg dµ l f p. Thus, g L q (µ), by Proposition 7.1. Since l and l g are

75 7.2. WEAK CONVERGENCE 71 continuous linear functionals on L p (µ) that coincide on the set of simple functions, Proposition 4.13 implies that l(f) = l g (f) for all functions f L p (µ). If µ is σ-finite, there are sets i i+1 so that = i and µ( i ) <. We may identify L p ( i ) with the subspace of L p () of functions that vanish on i c. Then l Lp ( i ) and so, by the preceding argument, there exists g i L q ( i ) with g i q = l L p ( i) l and so that l(f) = l gi (f) for all f L p ( i ). We have g i = g j µ-a.e. on i if i < j. So we may define g on by setting g i = g i. By the monotone convergence theorem 3.14, g q = lim i g i q l, thus g L q (µ). And g satisfies l(f) = l g (f) for all f L p (µ), since fχ i f in L p (µ) and therefore l(f) = lim l(fχ i ) = lim l gi (fχ i ) = lim gf dµ = l g (f). i i i i Finally, suppose that µ is arbitrary and that p > 1 (consequently q < ). By the previous paragraph, for each σ-finite subset E there is a unique g E L q (E) with l(f) = l ge (f) for all f L p (E) and g E q l. If F is σ-finite and F E, then g F = g E µ-a.e. on E and hence l g F q g E q. Then M := sup { g E q : σ-finite E } l. Let E k be a sequence of σ-finite subsets in such that g Ek q M, and set F := k=1 E k. Then F is σ-finite and g F q = M. If G F is σ-finite, then g F q dµ + g G\F q dµ = g G q dµ M q = g F q dµ, whence g G\F = and g G = g F µ-a.e. In particular, if f L p (µ) then the set G := F {x : f(x) } is σ-finite (as {x : f(x) } = {x : f(x) > 1/i}), and thus l(f) = fg G dµ = fg F dµ. So we may take g = g F. Corollary 7.3. If 1 < p < then L p (µ) is reflexive. Proof. Let q be the conjugate exponent. By Theorem 7.2, we have an isometric isomorphism L p (µ) = L q (µ). So if h L p (µ) = L q (µ) then there exists g L p (µ) such that h(f) = gf dµ, f L p (µ) = L q (µ). Consequently, h coincides with the evaluation mapping ev g : f f(g), hence ev : L p (µ) L p (µ) is surjective, i.e., L p (µ) is reflexive. The dual space of L (µ) is much larger than L 1 (µ), see the following example; its description will not be given here. Example 7.4. Consider the interval [, 1] with the Lebesgue measure λ. The mapping ev : f f() is a bounded linear functional on the subspace C([, 1]) of L ([, 1]). By the Hahn Banach theorem A.2, there exists l L ([, 1]) such that l(f) = f() for all f C([, 1]). Let f k C([, 1]) be given by f k (x) := max{1 kx, }. Then l(f k ) = f k () = 1 for all k and f k (x) for all x >. So for any g L 1 ([, 1]) we have [,1] f kg dλ, by the dominated convergence theorem Thus l cannot be of the form l g for any L 1 -function g Weak convergence Let (, S, µ) be a measure space, and let 1 p. A sequence of functions f k L p (µ) is said to converge weakly to f L p (µ), and we write f k f, if l(f k ) l(f) for all l L p (µ).

76 72 7. THE DUAL OF L P Obviously, strong convergence implies weak convergence. Proposition 7.5. If f L p (µ) and l(f) = for all l L p (µ), then f = (where we assume that µ is σ-finite in the case p = ). Consequently, weak limits in L p (µ) are unique. Proof. This follows from (7.2), in fact, if q is conjugate to p, then f p = l f = sup fg dµ = sup l g (f) =, g q 1 and thus f =. g q 1 The following is a particular case of the Banach Alaoglu theorem. Theorem 7.6. If 1 < p < then a bounded sequence in L p (µ) has a weakly convergent subsequence. Proof. This follows from a fundamental result of functional analysis which states that a Banach space is reflexive if and only if its closed unit ball is weakly sequentially compact, cf. [2]. We will give a direct proof in the case that is an open subset of R n and µ = λ is the Lebesgue measure. Let f i be a bounded sequence in L p (). By extending each f i by outside we may assume that f i L p (R n ). By Theorem 7.2, we may identify L p (R n ) with L q (R n ), where q is conjugate to p. By Theorem 4.23, there is a dense sequence of functions g j L q (R n ). Consider the sequence of numbers C i1 := f i g 1 dx which is bounded, by Hölder s inequality 4.2. By passing to a subsequence denoted by fi 1 we may assume that C i1 C 1. Repeating this argument with fi 1, we can pass to a further subsequence fi 2 so that fi 2g 2 dx C 2, and inductively we obtain a countable family of subsequences such that for the kth subsequence (and all further subsequences) f k i g k dx C k as i. Then the sequence defined by F j := f j j satisfies Fj g k dx C k as j for all k. If g L q (R n ) and ɛ >, then g g k q ɛ for some k. Thus F j g dx F i g dx F j g g k dx + F i g k g dx + F j g k dx F i g k dx 2ɛ sup F j p + ɛ, j for sufficiently large i and j. Hence the limit lim j Fj g dx exists. Setting l(g) := lim j Fj g dx we obtain a bounded linear functional on L q (R n ). By Theorem 7.2, there exists f L p (R n ) such that l(g) = fg dx for all g L q (R n ). The proof is complete Interpolation theorems We have seen in Proposition 4.7 that L p (µ) L r (µ) L q (µ) L p (µ) + L r (µ) provided that 1 p < q < r, and the first inclusion is bounded. Now we investigate the question whether a linear operator which is bounded on L p (µ) and L r (µ) is also bounded on L q (µ). We need a preliminary lemma from complex analysis.

77 7.3. INTERPOLATION THEOREMS 73 Lemma 7.7 (Three lines lemma). Let S := {z C : Re z 1} and let f : S C be bounded continuous and holomorphic in the interior of the strip S. If f(z) M for Re z = and f(z) M 1 for Re z = 1, then f(z) M 1 t M1 t for Re z = t and < t < 1. Proof. For ɛ > define f ɛ (z) := f(z)m z 1 M z 1 exp(ɛz(z 1)). Then f ɛ satisfies the assumptions with M and M 1 replaced by 1. Moreover, f ɛ (z) as Im z (uniformly for Re z 1). So f ɛ (z) 1 for z on the boundary of a rectangle {z : Re z 1, Im z < A}. The maximum principle implies that f ɛ (z) 1 for z S. Thus, for Re z = t, f(z) M t 1 M t 1 = lim ɛ f ɛ (z) 1, and the lemma is proved. We are ready to prove the Riesz Thorin interpolation theorem which shows that the answer to the above question is yes. Theorem 7.8 (Riesz Thorin). Let (, S, µ) and (Y, T, ν) be measure spaces and let p, p 1, q, q 1 [1, ]. If q = q 1 = we also assume that ν is σ-finite. Let p t, q t, < t < 1, be defined by 1 = 1 t + t 1, = 1 t + t. p t p p 1 q t q q 1 If T : L p (µ) + L p1 (µ) L q (ν) + L q1 (ν) is a linear mapping such that then for all < t < 1, T f q M f p, for all f L p (µ), T f q1 M 1 f p1, for all f L p1 (µ), T f qt M 1 t M t 1 f pt, for all f L pt (µ). (7.3) Proof. If p = p 1 = p, then by Proposition 4.7, T f qt T f 1 t q T f t q 1 M 1 t M t 1 f p for all f L p (µ), and we are done. So we may assume that p p 1, and thus p t <, for all < t < 1. Let S be the class of simple functions s on with µ({x : s(x) }) <, and S Y the class of simple functions s on Y with ν({x : s(x) }) <. We shall show that (7.3) holds for all f S. Since S is dense in L p (µ), by Proposition 4.13, we may conclude that T S has a unique extension T to L pt (µ) satisfying the same estimate there. It remains to prove that T = T on L pt (µ). For f L pt (µ) choose a sequence f n S with f n f and f n f pointwise; cf. Corollary 3.7. Set E := {x : f(x) > 1}, g = χ E f, and g n = χ E f n. If p < p 1 (which we may assume without loss of generality), then g L p (µ) and f g L p1 (µ) (cf. Proposition 4.7) and, by the dominated convergence theorem 3.22, f n f pt, g n g p, and (f n g n ) (f g) p1. It follows that T g n T g q and T (f n g n ) T (f g) q1. By passing the a subsequence we get T g n T g ν-a.e. and T (f n g n ) T (f g) ν-a.e., by Corollary 4.11, and may conclude that T f n T f ν-a.e. By Fatou s lemma 3.17, T f qt lim inf T f n qt lim inf M 1 t M1 f t n pt = M 1 t M1 f t pt and (7.3) is proved. Let us show that (7.3) holds for all f S. By Proposition 7.1, { } T f qt = sup (T f)g dν : g S Y, g q t = 1, Y

78 74 7. THE DUAL OF L P where q t is the conjugate exponent to q t ; the set {y : T f(y) } is σ-finite either since T f L q (ν) L q1 (ν) or, if q = q 1 =, by assumption. We may assume that f and that f pt = 1, by rescaling. Thus in order to show that (7.3) holds for all f S it suffices to prove the following claim. Claim: If f S, f pt = 1, then (T f)g dν M 1 t M1, t for g S Y, g q t = 1. Y Let f = m j=1 a jχ Ej and g = n k=1 b kχ Fk be canonical representations, and write a j = a j e iϕj and b k = b k e iψ k. Define π(z) := 1 z p + z p 1, τ(z) = 1 z q + z q 1, z C, so that π(t) = 1/p t and τ(t) = 1/q t for < t < 1. Fix t and set m f z := a j π(z) π(t) e iϕ j χ Ej ; j=1 note that π(t) >. If τ(t) < 1 set n g z := b k 1 τ(z) 1 τ(t) e iψ k χ Fk, k=1 otherwise, if τ(t) = 1, set g z = g for all z. Assume that τ(t) < 1 (the case τ(t) = 1 follows similarly). Consider the entire function m n Φ(z) := (T f z )g z dν = a j π(z) π(t) bk 1 τ(z) 1 τ(t) e i(ϕ j+ψ k ) (T χ Ej )χ Fk dν Y j=1 k=1 which is bounded on the strip {z C : Re z 1}. By the three lines lemma 7.7, the claim follows if we show that Φ(z) M for Re z = and Φ(z) M 1 for Re z = 1. By Hölder s inequality 4.2, for s R, Φ(is) T f is q g is q M f is p g is q. Since π(is) := 1/p + is(1/p 1 1/p ) and 1 τ(is) = (1 1/q ) + is(1/q 1 1/q ), m f is = a j Re(π(is)) π(t) χ Ej = f Re(π(is)) π(t) = f p t p, and hence f is p p g is = j=1 n k=1 b j Re(1 τ(is)) 1 τ(t) χ Fk = g Re(1 τ(is)) 1 τ(t) = f pt p t = 1 and g is q q = g q t q t Y q t q = f, = 1. Thus, Φ(z) M for Re z =. A similar computation shows Φ(z) M 1 for Re z = 1. The proof is complete. The second fundamental interpolation result is the Marcinkiewicz interpolation theorem. Let T be a mapping from some vector space F of measurable functions on (, S, µ) to the space of measurable functions on (Y, T, ν). Then T is called sublinear if for all f, g F and c >, T (cf) = c T f and T (f +g) T f + T g. Theorem 7.9 (Marcinkiewicz). Let (, S, µ) and (Y, T, ν) be measure spaces and let p, p 1, q, q 1 [1, ] satisfy p q, p 1 q 1, and q q 1. Let p t, q t, < t < 1, be defined by 1 = 1 t + t, p t p p 1 1 = 1 t + t. (7.4) q t q q 1

79 7.3. INTERPOLATION THEOREMS 75 If T is a sublinear mapping on L p (µ)+l p1 (µ) to the space of measurable functions on Y such that T f q, M f p, for all f L p (µ), (7.5) T f q1, M 1 f p1, for all f L p1 (µ), then for all < t < 1, T f qt M t f pt, for all f L pt (µ), (7.6) where M t depends only on M i, p i, q i, t, for i =, 1. In other words, if the sublinear mapping T is weak type (p, q ) and (p 1, q 1 ) then T is strong type (p t, q t ), i.e., T maps L pt (µ) to L qt (ν) and T f qt C f pt holds for all f L pt (µ). In the proof we make use of the following simple lemma. Lemma 7.1. Let f be measurable and let A >. For E A = {x : f(x) > A} set h A := fχ E c A + A(sgn f)χ EA and g A = f h A. Then d ga (α) = d f (α + A) and d ha (α) = d f (α) if α < A and d ha (α) = if α A. Proof. Note that g A = (sgn f)( f A)χ EA and thus g A (x) > α if and only if f(x) > α + A. This implies d ga (α) = d f (α + A). The second statement is obvious. Proof of Theorem 7.9. Assume that p = p 1 = p (and hence p ) and (say) q < q 1 <. Then (7.5) implies ( M f ) p q, ( M1 f ) q1 p d T f (β) dt f (β) β β and, by Proposition 4.29 (and Remark 4.3), with A = f p and q = q t, T f q q = q = q A qm q f q p = β q 1 d T f (β) dβ β q 1 d T f (β) dβ + q A q q1 qm f q qm1 p + q q q 1 q f q p A β q 1 d T f (β) dβ β q q 1 dβ + qm q1 1 f q1 p A β q q1 1 dβ which implies the statement. If q 1 = then T f M 1 f p and thus d T f (β) = if β > M 1 f p. So it suffices to repeat the computation with A = M 1 f p. Let us now consider the case p < p 1 and q < and q 1 <. Let p = p t, q = q t and f L p (µ). Then, with the notation of Lemma 7.1, g A p dµ = p α p 1 d ga (α) dα = p α p 1 d f (α + A) dα = p (α A) p 1 d f (α) dα p α p 1 d f (α) dα, A A h A p1 dµ = p 1 α p1 1 d ha (α) dα = p 1 α p1 1 d f (α) dα, by Proposition 4.29 (and Remark 4.3). Moreover, T f q dν = q β q 1 d T f (β) dβ = 2 q q A β q 1 d T f (2β) dβ.

80 76 7. THE DUAL OF L P Since T is sublinear, d T f (2β) d T ga (β) + d T ha (β) for all β, A >, by Lemma Let us apply this for A = β r, where by (7.4). By assumption (7.5), and thus where T f q q 2 q q 2 q q r := p (q q) q (p p) = p 1(q 1 q) q 1 (p 1 p), β q d T ga (β) ( T g A q, ) q (M g A p ) q, β q1 d T ha (β) ( T h A q1, ) q1 (M 1 h A p1 ) q1, 2 q qm q pq/p = 1 i= β q 1( d T ga (β) + d T ha (β) ) dβ β q 1( (M g A p /β) q + (M 1 h A p1 /β) q1) dβ + 2 q qm q1 1 pq1/p1 1 2 q qm qi i pqi/pi i β q q 1( β r β q q1 1( r β ) q/p α p 1 d f (α) dα dβ ( qi/p i ϕ i (α, β) dα) dβ, ϕ i (α, β) := χ i (α, β)α pi 1 d f (α)β (q qi 1)pi/qi, χ := χ {(α,β):α>βr }, χ 1 := χ {(α,β):α<βr }. α p1 1 d f (α) dα) q1/p 1 dβ Since q i /p i 1, Minkowski s integral inequality 4.5 gives ( ) qi/p i ( ( pi/q i ) qi/p i. ϕ i (α, β) dα dβ ϕ i (α, β) dβ) qi/pi dα If q 1 > q, then q q > and r >, and α > β r if and only if α 1/r > β, whence ( p/q ϕ (α, β) dβ) q/p dα = ( α 1/r β q q 1 dβ) p/q α p 1 d f (α) dα = (q q ) p/q α p 1+p(q q)/(qr) d f (α) dα = (q q ) p/q α p 1 d f (α) dα = q q p/q p 1 f p p. If q 1 < q, then q q < and r <, and α > β r if and only if α 1/r < β, whence ( p/q ϕ (α, β) dβ) q/p dα ( ) p/q α = β q q 1 p dβ 1 d f (α) dα α 1/r = (q q) p/q α p 1+p(q q)/(qr) d f (α) dα

81 7.3. INTERPOLATION THEOREMS 77 = (q q) p/q α p 1 d f (α) dα = q q p/q p 1 f p p. Similarly, ( p1/q 1 ϕ 1 (α, β) dβ) q1/p1 dα q q1 p1/q1 p 1 f p p. So for all f L p (µ) with f p = 1, T f q 2q 1/q( 1 i= M qi i (p ) i/p) qi/pi 1/q =: Mp. q q i Since T is sublinear, in particular, T (cf) = c T f if c >, (7.6) follows. In the remaining cases q = or q 1 = we indicate how to modify the arguments. If p 1 = q 1 = (hence p q < ), use A = β/m 1. Then T h A M 1 h A β and thus d T ha (β) =. If p < p 1 < and q < q 1 =, use A = (β/b) r with B = M 1 (p 1 f p p/p) 1/p1 and r = p 1 /(p 1 p). Similarly, if p < p 1 < and q 1 < q =, use A = (β/b) r with B chosen such that d T ga (β) =. Let us apply the Marcinkiewicz interpolation theorem 7.9 to the Hardy Littlewood maximal operator M defined by Mf(x) = sup f(y) dy, f L 1 loc(r n ). r> B r(x) Corollary There is a constant C > such that, for 1 < p <, Mf p C p p 1 f p, f L p (R n ). (7.7) Proof. Clearly, Mf f for f L (R n ), and by Theorem 6.3, Mf 1, C f 1 for f L 1 (R n ). Obviously, M is sublinear. Then (7.7) follows from the Marcinkiewicz interpolation theorem 7.9; the constant Cp/(p 1) results from an inspection of the proof of Theorem 7.9.

82

83 CHAPTER 8 The Fourier transform 8.1. The Fourier transform on L 1 For a function f L 1 (R n ) the Fourier transform f is defined by f(ξ) := f(x)e 2πiξ x dx, ξ R n, (8.1) R n where ξ x := ξ 1 x ξ n x n ; we shall also write F f = f. It follows from Theorem 3.37 that f is continuous on R n. Moreover, as f(ξ) f(x) dx = f 1, R n f is bounded and satisfies f f 1. (8.2) Note that we have equality in (8.2) if f : f() = f(x) dx = f 1 = f. R n Next we collect elementary properties of the Fourier transform. For y, η R n we consider the translation operator T y f(x) := f(x y), x R n, cf. (4.6), and the modulation operator, We have the commutation relations M η f(x) := e 2πiη x f(x), x R n. (8.3) T y M η = e 2πiη y M η T y. Recall that C (R n ) denotes the space of all continuous functions f : R n C so that f(x) as x. Note that C (R n ) is the closure of C c (R n ) with respect to. Indeed, if f i C c (R n ) converge uniformly to f C(R n ), then for each ɛ > there is i N such that f i f < ɛ, and hence f(x) < ɛ if x supp f i, i.e., f C (R n ). Conversely, for f C (R n ) and each positive integer consider the compact set K i := {x : f(x) 1/i}. Choose g i C c (R n ) so that g i 1 and g i Ki = 1. Then f i := fg i C c (R n ) satisfies f i f = f(g i 1) 1/i. Lemma 8.1. Let f, g L 1 (R n ), y, η R n, and a >. Then: (1) (T y f) = M y f and (Mη f) = T η f. (2) (f(ax)) (ξ) = a n f(a 1 ξ) and (f( x)) (ξ) = f( ξ). (3) (f g) = fĝ. (4) If x x α f(x) is in L 1 (R n ) for all α k, then f C k (R n ) and α f = (( 2πix) α f(x)). (5) If f C k (R n ), α f L 1 (R n ) for all α k, and α f C (R n ) for all α k 1, then ( α f) (ξ) = (2πiξ) α f(ξ). 79

84 8 8. THE FOURIER TRANSFORM (6) fg dx = fĝ dx. Proof. (1) We have (T y f) (ξ) = f(x y)e 2πiξ x dx = R n f(x)e 2πiξ (x+y) dx = e 2πiξ y f(ξ) R n and (M η f) (ξ) = f(x)e 2πi(ξ η) x dx = f(ξ η) = T η f(ξ). R n (2) Both assertions follow from (f(ax)) (ξ) = f(ax)e 2πiξ x dx = a n f(x)e 2πia 1 ξ x dx = a n f(a 1 ξ), R n R n where either a > or a = 1. (3) By Young s inequality 4.15, f g L 1 (R n ) and so, by Fubini s theorem 3.27, (f g) (ξ) = f(x y)g(y)e 2πiξ x dy dx R n R n = f(x y)e 2πiξ (x y) g(y)e 2πiξ y dx dy R n R n = f(ξ) g(y)e 2πiξ y dy = f(ξ)ĝ(ξ). R n (4) By Theorem 3.38, α f(ξ) = R n ( 2πix) α f(x)e 2πiξ x dx = (( 2πix) α f(x)) (ξ). (5) By partial integration, cf. Corollary 6.14, ( α f) (ξ) = α f(x)e 2πiξ x dx = (2πiξ) α f(x)e 2πiξ x dx = (2πiξ) α f(ξ). R n R n (6) Both integrals equal f(x)g(ξ)e 2πix ξ dx dξ, by Fubini s theorem The proof is complete. where Let S(R n ) denote the Schwartz space of rapidly decreasing functions: S(R n ) := {f C (R n ) : f k,α < for all k N, α N n }, Lemma 8.2. We have: f k,α := sup x R n (1 + x ) k α f(x). (1) If f S(R n ) then α f L p (R n ) for all α N n and all 1 p. (2) Let f C (R n ). Then f S(R n ) if and only if x β α f(x) is bounded for all α, β if and only if α (x β f(x)) is bounded for all α, β. (3) S(R n ) is a Fréchet space with the topology defined by the seminorms k,α. Proof. (1) If f S(R n ) then α f(x) C(k)(1 + x ) k for all k, and (1 + x ) k L p (R n ) if k > n/p, cf. (3.7). (2) Clearly, x β (1+ x ) k if β k. On the other hand, n x i k is strictly positive on the unit sphere x = 1, thus it has a positive minimum m there. We may conclude that n x i k m x k, by homogeneity of both sides. Then (1 + x ) k 2 k max{1, x k } 2 k (1 + x k )

85 8.1. THE FOURIER TRANSFORM ON L k( 1 + m 1 n x i k) 2 k m 1 β k x β. The first equivalence follows. The second equivalence is an easy consequence of the Leibniz formula. (3) We must show completeness. Let f m be a Cauchy sequence in S(R n ), i.e., for all k, α, f m f l k,α as m, l. Then for each α, the sequence α f m converges uniformly to a continuous function f α. Denoting e j the standard unit vectors in R n, we have and letting m we obtain f m (x + te j ) f m (x) = f (x + te j ) f (x) = t t j f m (x + se j ) ds, f ej (x + se j ) ds, and hence f ej = j f. By induction, we find that f α = α f for all α, thus f := f C (R n ). Let us show that f S(R n ). Since f m (being Cauchy) is bounded in S(R n ), we have f m α,k C α,k for all m, thus α f m (x) C α,k (1 + x ) k for all x and all m. Letting m implies α f(x) C α,k (1 + x ) k for all x, i.e., f α,k C α,k. Finally, we check that f m converges to f in S(R n ). For fixed α and k, set g m (x) := (1 + x ) k α f m (x) and g(x) := (1 + x ) k α f(x). Then g m is a Cauchy sequence with respect to which converges uniformly to g, since g m g pointwise and the limit is unique. That is f m f α,k = g m g as required. Proposition 8.3. The Fourier transform maps S(R n ) continuously into itself. Proof. If f S(R n ) then x α β f(x) L 1 (R n ) C (R n ) for all α, β, by Lemma 8.2. Thus, by Lemma 8.1, f C (R n ) and ξ α β ξ f(ξ) = ( 1) β (2πi) β ξ α [x β f(x)] (ξ) = ( 1) β (2πi) β α [ α x (x β f(x))] (ξ). Consequently, ξ α β ξ f(ξ) (2π) β α R n α x (x β f(x)) dx (2π) β α R n (1 + x ) n 1 dx sup x R n (1 + x ) n+1 α x (x β f(x)) which implies the statement in view of Lemma 8.2. Lemma 8.4 (Riemann Lebesgue). F L 1 (R n ) C (R n ). Proof. The Fourier transform maps functions in Cc (R n ) S(R n ) to functions in S(R n ) C (R n ). By Theorem 4.2, Cc (R n ) is dense in L 1 (R n ), and if f k f 1 then f k f, by (8.2). This implies the statement, since C (R n ) is closed with respect to. At this point we compute the Fourier transform of a Gaussian function; this is a preparation for the Fourier inversion formula.

86 82 8. THE FOURIER TRANSFORM Lemma 8.5 (Fourier transform of the Gaussian). For f(x) = e πa x 2, where a >, we have f(ξ) = a n/2 e π ξ 2 /a. Proof. First suppose that n = 1. By Lemma 8.1, ( f) (ξ) = ( 2πixe πax2 ) (ξ) = ia 1 (f ) (ξ) = ia 1 2πiξ f(ξ) = 2πa 1 ξ f(ξ), hence ξ (e πξ2 /a f(ξ)) =, and so e πξ 2 /a f(ξ) is constant. Thus e πξ2 /a f(ξ) = f() = e πax2 dx = a 1/2, by Example The case n = 1 and Fubini s theorem 3.27 imply the general case, 2 n f(ξ) = e 2πiξ x dx = e πax2 j e 2πiξ j x j dx j = a n/2 e π ξ 2 /a. R n e πa x j=1 R R Let us turn to inversion of the Fourier transform. For f L 1 (R n ), we define f (x) := f( x) = f(ξ)e 2πiξ x dξ, x R n. R n Theorem 8.6 (Fourier inversion theorem). If f L 1 (R n ) and f L 1 (R n ), then f coincides a.e. with a continuous function f, and we have Proof. For t > and x R n, set By Lemmas 8.1 and 8.5, R n e πt ( f) = (f ) = f. ψ(ξ) := e 2πiξ x πt2 ξ 2 = M x e πt2 ξ 2 ψ(y) = T x (t n e π y 2 /t 2 ) = t n e π x y 2 /t 2 = ϕ t (x y), for ϕ(x) = e π x 2, cf. (4.7). By Lemma 8.1, 2 ξ 2 e 2πiξ x f(ξ) dξ = f(ξ)ψ(ξ) dξ = f(y) ψ(y) dy = f ϕ t which converges to f in L 1 (R n ) as t, by Proposition On the other hand, since f L 1 (R n ), lim 2 ξ 2 e 2πiξ x f(ξ) dξ = e 2πiξ x f(ξ) dξ = ( f) (x), t R n R n e πt by the dominated convergence theorem It follows that f = ( f) a.e. and analogously (f ) a.e. Being Fourier transforms of L 1 -functions, ( f) and (f ) are continuous. Corollary 8.7. If f L 1 (R n ) and f =, then f = a.e. Corollary 8.8. F : S(R n ) S(R n ) is an isomorphism. Proof. By Proposition 8.3, F maps S(R n ) continuously into itself, and so does the mapping f f, because f (x) = f( x). By Theorem 8.6, these mappings are inverse to each other. The Fourier transform of an L 1 -function need not be L 1 as illustrated by the following example.

87 8.2. THE FOURIER TRANSFORM ON L 2 83 Example 8.9 (The sinc function). Clearly, the characteristic function of the interval [ a, a] is in L 1 (R). Its Fourier transform χ [ a,a] (ξ) = a a e 2πixξ dx = e 2πiaξ 2πiξ + e2πiaξ 2πiξ = sin(2πaξ) πξ however is not an element of L 1 (R). In particular, the Fourier transform of the rectangular function χ [ 1/2,1/2] is the (normalized) sinc function sinc(x) = sin(πx)/(πx). By the lemma of Riemann Lebesgue 8.4, the Fourier transform is a bounded linear operator F : L 1 (R n ) C (R n ). It is injective, but not surjective. Proposition 8.1. The bounded linear operator F : L 1 (R n ) C (R n ) is injective, but not surjective. Proof. Assume f, g L 1 (R n ) and f = ĝ. Then f g L 1 (R n ) and f ĝ =. Thus Corollary 8.7 implies f = g a.e. Let us show that F : L 1 (R n ) C (R n ) is not surjective. For simplicity let n = 1. It is more convenient to show that the inverse Fourier transform ( ) : L 1 (R) C (R) is not surjective. The assertion is then an immediate consequence: if g C (R) \ (L 1 (R)) then g( x) C (R) \ F L 1 (R). Assume that ( ) : L 1 (R) C (R) is surjective. By the open mapping theorem A.3, there is a constant C > such that f 1 C f, for all f L 1 (R). (8.4) For ɛ > let g ɛ (x) := ɛ 1/2 e πx2 /ɛ and f ɛ := g ɛ χ [ 1,1]. Then f ɛ L 1 (R), by (4.3), and f ɛ C (R), by a simple computation. Thus by (8.4) and Example 3.35, By Lemmas 8.1 and 8.5, f ɛ 1 C f ɛ = C g ɛ χ [ 1,1] C g ɛ 1 = C. f ɛ (ξ) = ĝ ɛ (ξ) χ [ 1,1] (ξ) = e πɛξ2 χ [ 1,1] (ξ) χ [ 1,1] (ξ) pointwise as ɛ. So, by Fatou s lemma 3.17, χ [ 1,1] dξ = lim f 1/k dξ lim inf k k R a contradiction; see Example 8.9. R R f 1/k dξ C, 8.2. The Fourier transform on L 2 In the previous section we have seen that the Fourier transform is a bounded linear operator (cf. (8.2) and Lemma 8.4) F : L 1 (R n ) C (R n ). If we abandon the requirement that F be defined pointwise by (8.1), it can be extended to other spaces. Theorem 8.11 (Plancherel). If f L 1 (R n ) L 2 (R n ), then f L 2 (R n ), and F L 1 (R n ) L 2 (R n ) extends uniquely to an isometric isomorphism on L 2 (R n ). Proof. Let F 1 (R n ) := {f L 1 (R n ) : f L 1 (R n )}. (8.5) Then F 1 (R n ) L 2 (R n ), since f L 1 (R n ) implies f L (R n ) (cf. (8.2)) and thus f L 2 (R n ), by Proposition 4.7. Moreover, F 1 (R n ) is dense in L 2 (R n ), because

88 84 8. THE FOURIER TRANSFORM S(R n ) F 1 (R n ) and S(R n ) is dense in L 2 (R n ), by Theorem 4.2. Let f, g F 1 (R n ), and set h := ĝ. By Theorem 8.6, ĥ(ξ) = ĝ(x)e 2πiξ x dx = R n ĝ(x)e 2πiξ x dx = g(ξ), R n and hence, by Lemma 8.1, f(x)g(x) dx = f(x)ĥ(x) dx = f(x)h(x) dx = R n Rn R n f(x)ĝ(x) dx, i.e., F F 1 (R n ) preserves the L 2 -inner product. In particular, f 2 = f 2. (8.6) Since F (F 1 (R n )) = F 1 (R n ), by Theorem 8.6, F F 1 (R n ) extends by continuity to an isometric isomorphism F on L 2 (R n ). It remains to check that F = F on L 1 (R n ) L 2 (R n ). Let f L 1 (R n ) L 2 (R n ) and ϕ(x) := e π x 2. Then f ϕ ɛ L 1 (R n ), by Young s inequality (4.15), and (f ϕ ɛ ) (ξ) = f(ξ)e πɛ2 ξ 2, by Lemmas 8.1 and 8.5, and so (f ϕ ɛ ) L 1 (R n ), since f is bounded. That is f ϕ ɛ F 1 (R n ). By Proposition 4.18, f ϕ ɛ converges to f in L 1 (R n ) and in L 2 (R n ). We may infer (f ϕ ɛ ) f, by (8.2), and (f ϕ ɛ ) F f 2, by (8.6). By Corollary 4.11, there is a subsequence (f ϕ ɛk ) that converges pointwise a.e. to f as well as to F f. Therefore, F f = F f a.e. We denote by f = F f also the Fourier transform of functions f L 2 (R n ). Corollary 8.12 (Parseval s theorem). If f, g L 2 (R n ) then f, g = f, ĝ, i.e., F : L 2 (R n ) L 2 (R n ) is unitary. R n Proof. This follows from f 2 = f 2 by polarization, 2 f, g = f + g 2 2 i f + ig 2 2 (1 i) f 2 2 (1 i) g 2 2. The Fourier transform f of a function f L 2 (R n ) is not given by the formula (8.1); the integral in (8.1) may not exist. However, f is the L 2 -limit of the functions (χ Br()f) (ξ) = f(x)e 2πiξ x dx B r() as r. Here χ Br()f L 2 (B r ()) L 1 (B r ()), by Proposition 4.9, and so the integral exists. By the monotone convergence theorem 3.14, χ Br()f f 2 as r and hence (χ Br()f) f 2, by Theorem By the same argument f is the L 2 -limit of the Fourier transform of every sequence of functions f m L 1 (R n ) L 2 (R n ) that converges to f in L 2 (R n ). By Corollary 4.11 there is a subsequence that converges a.e., and so for f L 1 (R n ) L 2 (R n ) the integral in (8.1) coincides a.e. with the extension provided by Theorem For instance, by Example 8.9, [ r,r] χ [ a,a] (ξ) e 2πiξx dξ = converges to χ [ a,a] in L 2 (R) as r. [ r,r] sin(2πaξ) πξ e 2πiξx dξ Corollary The inversion formula f = ( f) continuous to hold on L 2 (R n ).

89 8.3. PALEY WIENER THEOREMS 85 Proof. By Theorem 8.11 the definition f (x) := f( x) makes sense for f L 2 (R n ). Since f = ( f) holds on F 1 (R n ) (cf. (8.5)), by Theorem 8.6, and since F 1 (R n ) is dense in L 2 (R n ), we can conclude the assertion from Theorem 8.11 (which clearly holds also for f replaced by f ). By Plancherel s theorem 8.11, the Fourier transform is a linear mapping L 1 (R n ) + L 2 (R n ) L (R n ) + L 2 (R n ) satisfying f f 1 for f L 1 (R n ) and f 2 = f 2 for f L 2 (R n ). By the Riesz Thorin interpolation theorem 7.8, we get the following result for immediate L p -spaces. Theorem 8.14 (Hausdorff Young inequality). Let 1 p 2 and let q be the conjugate exponent to p. If f L p (R n ) then f L q (R n ) and f q f p. Proof. Apply the Riesz Thorin interpolation theorem 7.8. In Lemma 8.5 we have seen by means of a Gaussian function that the Fourier transform maps an acute peak to a broadly spread peak. This is a general property of the Fourier transform that is called the uncertainty principle. Theorem 8.15 (Heisenberg s uncertainty principle). If f S(R n ), then for all y, η R n, j = 1,..., n. f 2 2 4π (x j y j )f(x) 2 (ξ j η j ) f(ξ) 2 Thus f and f cannot both be sharply localized about single points. Proof. Replacing f by M ηje j T yje j f, where e j is the jth standard unit vector in R n, we may assume that y = η =, in view of Lemma 8.1. Integration by parts (cf. Corollary 6.14), Hölder s inequality 4.2, and (8.6) yield f 2 2 = f(x)f(x) xj x j dx R n = ( j f(x)f(x) + f(x) j f(x))x j dx R n 2 x j f(x) 2 j f 2 = 4π x j f(x) 2 ξ j f(ξ) 2, where in the last step we again used Lemma 8.1 and (8.6) Paley Wiener theorems As seen in Lemma 8.1, the smoothness of a function is connected to the decay of its Fourier transform at infinity (and vice versa). We shall see below that in the extrem case, when f is compactly supported on R, its Fourier transform f extends to an entire function. Theorems that relate decay properties of a function (or distribution) at infinity with analyticity of its Fourier transform are called Paley Wiener theorems. We will investigate two such theorems. The Fourier transform f of a function f on R is by definition a function on R. Often f admits a holomorphic extension to some region in C which is not too surprising, since e 2πtz is an entire function of z for every real t. Let us formally consider the integral that defines the inverse Fourier transform f(z) = F (t)e 2πitz dt (8.7)

90 86 8. THE FOURIER TRANSFORM and allow z to be a complex number. In general, this integral may not be welldefined. We shall consider two situations which ensure the existence of this integral. First we assume that F is supported on R + := {x R : x > } and z lies in the upper half-plane H := {z C : Im z > }. For F L 2 (R + ) and z H, the integral f(z) = F (t)e 2πitz dt, z H, (8.8) exists as Lebesgue integral, since e 2πitz = e 2πt Im z is in L 2 (R + ) for each z H. Theorem 8.16 (Paley Wiener I). If f is of the form (8.8), then f is holomorphic in H and sup f(x + iy) 2 dx = C <. (8.9) y> Conversely, if f is holomorphic in H and satisfies (8.9), then there exists F L 2 (R + ) such that f has the representation (8.8) and F (t) 2 dt = C. (8.1) Proof. Assume that F L 2 (R + ) and that f is given by (8.8). By Theorem 3.39 (applied to each half-plane {z : Im z > δ}, δ > ), f is holomorphic in H. For fixed y >, f(x + iy) = and Plancherel s theorem 8.11, yields f(x + iy) 2 dx = F (t)e 2πty e 2πitx dt F (t) 2 e 4πty dt F (t) 2 dt; (8.11) we may consider F as a function in L 2 (R) by extending it by on (, ]. This shows (8.9). Now let f be holomorphic in H and satisfy (8.9). Fix y >, α >, and let γ α denote the rectangular path with vertices ±α + i and ±α + iy. By Cauchy s theorem, for all t R, γ α f(ζ)e 2πitζ dζ =. (8.12) Let Φ(β), β R, be the integral of f(ζ)e 2πitζ along the line segment from β + i to β + iy. If I denotes the real interval with endpoints 1 and y, then by Hölder s inequality 4.2, Φ(β) 2 = f(β + is)e 2πit(β+is) ds 2 I f(β + is) 2 ds e 4πts ds =: Ψ(β) e 4πts ds. (8.13) I By (8.9) and Fubini s theorem 3.27, Ψ(β) dβ = f(β + is) 2 dβ ds Cλ(I) = C 1 y. I It follows that there is a sequence α k such that Ψ(±α k ). Hence, in view of (8.13), Φ(±α k ), (8.14) for all t, and α k is independent of t. I I

91 8.3. PALEY WIENER THEOREMS 87 Let us consider g k (y, t) := Then (8.12) and (8.14) imply αk α k f(x + iy)e 2πitx dx. e 2πty g k (y, t) e 2πt g k (1, t) as k. (8.15) If f y (x) := f(x + iy), then f y L 2 (R) by (8.9). By Plancherel s theorem 8.11, g k (y, ) f y 2 as k. By Corollary 4.11, there is a subsequence of (g k (y, t)) k which converges to f y (t) for a.e. t. Thus, if we define F (t) := e 2πt f1 (t), t R, then (8.15) implies that, for each y >, F (t) = e 2πty f y (t) for a.e. t R. Applying Plancherel s theorem 8.11 gives e 4πty F (t) 2 dt = f y (t) 2 dt C, for all y >, by (8.9). Letting y implies F (t) = for a.e. t <, and letting y gives F (t) 2 dt C. (8.16) This implies that f y (t) = e 2πty F (t) is in L 1 (R). Thus, by Corollary 8.13 (and the arguments preceding it), that is f(z) = f y (x) = F (t)e 2πty e 2πitx dt = Finally, (8.1) follows from (8.16) and (8.11). f y (t)e 2πitx dt, F (t)e 2πitz dt, z H. Thanks to (8.9), the dominated convergence theorem 3.22 implies lim y + f(x + iy) F (x) 2 dx =. (8.17) The theorem describes the structure of the Hardy space H 2 (H) of the upper half-plane, i.e., H 2 (H) := {f : f holomorphic on H, f H2 (H) < }, which is a Hilbert space with norm given by ( 1/2. f H2 (H) := sup f(x + iy) dx) 2 y> Indeed, the above theorem implies the following corollary. Corollary The mapping F f(z) = F (t)e 2πitz dt yields an isomorphism between L 2 (R + ) and H 2 (H).

92 88 8. THE FOURIER TRANSFORM Another way to make sense of the integral (8.7) is to require that F is compactly supported. If < A < and F L 2 ([ A, A]), then clearly is well-defined. f(z) = A A F (t)e 2πitz dt, z C, (8.18) Theorem 8.18 (Paley Wiener II). If f is of the form (8.18), then f is entire and there exists C > such that f(z) Ce 2πA z, z C, (8.19) and f R L 2 (R). Conversely, if f is an entire function satisfying (8.19) for some positive constants A and C, and f R L 2 (R), then there exists F L 2 ([ A, A]) such that f has the representation (8.18). Entire functions f satisfying (8.19) are said to be of exponential type. Proof. If f is of the form (8.18), then f is entire by Theorem 3.39, and f(z) A A F (t) e 2πt Im z dt A A 2πA Im z F (t) dt e which implies (8.19). By Plancherel s theorem 8.11, f R L 2 (R). Assume that f is an entire function satisfying (8.19) for some positive constants A and C, and f R L 2 (R). Define f ɛ (x) := f(x)e 2πɛ x, for ɛ > and x R. We claim that lim f ɛ (x)e 2πitx dx = for t R \ [ A, A]. (8.2) ɛ This claim will imply the theorem as follows. By the dominated convergence theorem 3.22, f ɛ f R 2 as ɛ, and so by Plancherel s theorem 8.11, f ɛ f R 2. Then, by (8.2) and Corollary 4.11, F := f R vanishes a.e. outside [ A, A]. By Corollary 8.13, the representation (8.18) holds for a.e. real z, and hence for all z C, because both sides of (8.18) are entire functions. Let us prove (8.2). For real α let γ α be the ray defined by γ α (s) := se iα, s [, ). Define Φ α (w) := f(z)e 2πwz dz = e iα f(se iα )e 2πwseiα ds, γ α for w Π α := {w C : Re(we iα ) > A}. By (8.19), f(se iα )e 2πwseiα Ce 2πAs e 2πs Re(weiα) = Ce 2πs(Re(weiα ) A), and so, by Theorem 3.39, Φ α is holomorphic on the half-plane Π α. More is true for α = and α = π. Since f R L 2 (R), Φ (w) = is holomorphic in {w W : Re w > } and Φ π (w) = f(s)e 2πws ds f( s)e 2πws ds = is holomorphic in {w W : Re w < }. Now, for t R, f(s)e 2πws ds f ɛ (x)e 2πitx dx = Φ (ɛ + it) Φ π ( ɛ + it). (8.21)

93 8.3. PALEY WIENER THEOREMS 89 We will show that any two of the functions Φ α coincide on the intersection of their domains of definition (i.e., they are analytic continuations of each other). Then { Φ π/2 (ɛ + it) Φ π/2 ( ɛ + it) if t < A, Φ (ɛ + it) Φ π ( ɛ + it) = Φ π/2 (ɛ + it) Φ π/2 ( ɛ + it) if t > A, evidently tends to as ɛ, and (8.2) is proved. Suppose that < β α < π. If w = w e i(α+β)/2, then Re(we iα ) = w Re(e i(α β)/2 ) = w cos α β =: w η >, 2 Re(we iβ ) = w Re(e i(β α)/2 ) = w cos β α = w η. 2 Thus, w Π α Π β provided that w > A/η. Consider the path integral f(z)e 2πwz dz, γ(t) = re it, t [α, β]. (8.22) γ Since Re(wγ(t)) = w r Re e i(t (α+β)/2) w rη and so, by (8.19), f(γ(t))e 2πwγ(t) Ce 2πr(A w η), the path integral (8.22) tends to as r if w > A/η. Thus, Cauchy s theorem implies that Φ α (w) = Φ β (w) if w = w e i(α+β)/2 and w > A/η. By the identity theorem for holomorphic functions Φ α = Φ β on the intersection of their domains of definition.