Analysis I. 1 Measure theory. Piotr Haj lasz. 1.1 σ-algebra.

Analysis I Piotr Haj lasz 1 Measure theory 1.1 σ-algebra. Definition. Let be a set. A collection M of subsets of is σ-algebra if M has the following properties 1. M; 2. A M = \ A M; 3. A 1, A 2, A 3,... M = A i M. The pair (, M) is called measurable space and elements of M are called measurable sets. xercise. Let (, M) be a measurable space. Show that 1. M; 2. A, B M = A \ B M; 3. M is closed under finite unions, finite intersections and countable intersections. xamples. 1. 2, the family of all subsets of is a σ-algebra. 2. M = {, } is a σ-algebra. 3. If is a fixed set, then M = {,,, \ } is a σ-algebra. 1

Proposition 1 If {M i } i I is a family of σ-algebras, then is a σ-algebra. A simple proof is left to the reader. M = i I M i Let R be a family of subsets of. By σ(r) we will denote the intersection of all σ-algebras that contain R. Note that there is at least one σ-algebra that contains R, namely 2. 1. σ(r) is a σ-algebra that contains R 2. σ(r) is the smallest σ-algebra that contains R in the sense that if M is a σ-algebra that contains R, then σ(r) M. We say that σ(r) is a σ-algebra generated by R. xample. If R = {}, then σ(r) = {,,, \ }. In the sequel we will need the following result. Proposition 2 The family M of subsets of a set is a σ-algebra if and only if it satisfies the following three properties 1. M; 2. If A, B M, then A \ B M. 3. If A 1, A 2, A 3,... M are pairwise disjoint, then A i M. We leave the proof as an exercise cf. the proof of Theorem 3(d). 1.2 Borel sets. Let be a metric space (or more generally a topological space). By B() we will denote the σ-algebra generated by the family of all open sets in. lements of B() are called Borel sets and B() is called σ-algebra of Borel sets. The following properties are obvious. 1. If U 1, U 2, U 3,... are open, then U i is a Borel set; 2. All closed sets are Borel; 3. If F 1, F 2, F 3,... are closed, then F i is a Borel set. 2

1.3 Measure. Definition. Let (, M) be a measurable space. A measure (called also positive measure) is a function µ : M [0, ] such that 1. µ( ) = 0; 2. µ is countably additive i.e. if A 1, A 2, A 3,... M are pairwise disjoint, then ( ) µ A i = µ(a i ). The triple (, M, µ) is called measure space. If µ() <, then µ is called finite measure. If µ() = 1, then µ is called probability or probability measure. If we can write = A i, where A i M and µ(a i ) < for all i = 1, 2, 3,..., then we say that µ is σ-finite. xample. If is an arbitrary set, and µ : 2 [0, ] is defined by µ() = m if is finite and has m elements, µ() = if is infinite, then µ is a measure. It is called counting measure. Theorem 3 (lementary properties of measures) Let (, M, µ) be a measure space. Then (a) If the sets A 1, A 2,..., A n M are pairwise disjoint then µ(a 1... A n ) = µ(a 1 ) +... + µ(a n ). (b) If A, B M, A B and µ(b) <, then µ(b \ A) = µ(b) µ(a). (c) If A, B M, A B, then µ(a) µ(b). (d) If A 1, A 2, A 3,... M, then ( ) µ A i µ(a i ). (e) If A 1, A 2, A 3,... M, µ(a i ) = 0, for i = 1, 2, 3,... then µ( A i) = 0. 3

(f) If A 1, A 2, A 3,... M, A 1 A 2 A 3... then ( ) µ A i = lim µ(a i ). i (g) If A 1, A 2, A 3,... M, A 1 A 2 A 3... and µ(a 1 ) <, then ( ) µ A i = lim µ(a i ). i Proof. (a) The sets A 1, A 2,..., A n,,,,... are pairwise disjoint. Hence µ(a 1... A n ) = µ(a 1... A n...) = µ(a 1 ) +... + µ(a n ) + µ( ) + µ( ) + µ( ) +... = µ(a 1 ) +... + µ(a n ). (b) Since B = A (B \ A) and the sets A, B \ A are disjoint we have and the claim easily follows. (c) The claim follows from (1). (d) We have A = A 1 A 2 A 3... µ(b) = µ(a) + µ(b \ A) (1) = A }{{} 1 (A 2 \ A 1 ) (A }{{} 3 \ (A 1 A 2 )) (A }{{} 4 \ (A 1 A 2 A 3 ))... }{{} B 1 B 2 B 3 B 4 The sets B i are pairwise disjoint and B i A i. Hence (e) It easily follows from (d). (f) Since A i A i+1, we have µ(a) = µ(b i ) µ(a i ). A = A 1 A 2 A 3... = A }{{} 1 (A 2 \ A 1 ) (A }{{} 3 \ A 2 ) (A }{{} 4 \ A 3 )... }{{} B 1 B 2 B 3 B 4 4

The sets B i are pairwise disjoint and hence µ(a) = µ(b i ) = lim i (µ(b 1 ) +... + µ(b i )) = lim i µ(b 1... B i ) = lim i µ(a i ). The limit exist because µ(a i ) µ(a i+1 ) (since A i A i+1 ). (g) It suffice to apply (f) to the sets A 1 \ A i. The proof is complete. xercise. Provide a detailed proof of (g). Where do we use the assumption that µ(a 1 ) <? Show an example that the conclusion of (g) need not be true without the assumption that µ(a 1 ) <. 1.4 Outer measure and Carathéodory construction. It is quite difficult to construct a measure with desired properties, but it is much easier to construct so called outer measure which has less restrictive properties. The Carathéodory theorem shows then how to extract a measure from an outer measure. Definition. Let be a set. A function is called outer measure if 1. µ ( ) = 0; 2. A B = µ (A) µ (B); 3. for all sets A 1, A 2, A 3,... µ : 2 [0, ] ( ) µ A i µ (A i ). We call a set µ -measurable if A µ (A) = µ (A ) + µ (A \ ). (2) Condition (2) is called Carathéodory condition. Since the inequality µ (A) µ (A ) + µ (A \ ) is always satisfied, in order to verify measurability of a set it suffices to show that µ (A) µ (A ) + µ (A \ ). 5

Proposition 4 All sets with µ () = 0 are µ measurable. Proof. If µ () = 0, then for an arbitrary set A we have because A. µ (A) µ (A \ ) = µ (A \ ) + µ (A ), }{{} 0 Let M be the class of all µ -measurable sets. Theorem 5 (Carathéodory) M is a σ-algebra and µ : M [0, ] is a measure. Proof. We will split the proof into several steps. Step I: M This is obvious. Step II:, F M = F M. Let A. We have µ (A) = µ (A ) + µ (A \ ) = µ (A ) + µ ((A \ ) F ) + µ ((A \ ) \ F ). Since A = A ( F ) and (A \ ) F = (A ( F )) \, we have µ (A) = µ (A ( F ) ) + µ (A ( F ) \ ) + µ (A \ ( F )) = µ (A ( F )) + µ (A \ ( F )). Step III:, F M = \ F M. It follows from the symmetry of the condition (2) that \ M. Since \ F = \ (( \ ) F ) the claim follows. Step IV: If the sets 1, 2, 3,... M are pairwise disjoint, then for every A µ (A i ) = µ (A i ). If, F M are pairwise disjoint, the for every A µ (A ( F )) = µ (A ( F ) ) + µ (A ( F ) \ ) = µ (A ) + µ (A F ). 6

This, Step II and the induction argument implies that for every n n n µ (A i ) = µ (A i ). (3) Hence and thus in the limit µ (A µ (A i ) i ) n µ (A i ) µ (A i ). Since the opposite inequality is a property of an outer measure, the claim follows. Step V: If the sets 1, 2, 3,... M are pairwise disjoint, then i M. Step II and the induction argument implies that for every n n i M. Hence (3) gives µ (A) = µ (A ) n i + µ (A \ ) n i ) n µ (A i ) + µ (A \ i, and in the limit ) ) ) µ (A) µ (A i ) + µ (A \ i = µ (A i + µ (A \ i. Since the opposite inequality is a property of an outer measure the claim follows. Step VI: Final step. It follows from Steps I, III, V and Proposition 2 that M is a σ-algebra and then Step IV with A = implies that µ restricted to M is a measure. The proof is complete. We say that a measure µ : M [0, ] is complete if every subset of a set of measure zero is measurable (and hence has measure zero). Therefore it follows from Proposition 4 that the measure described by the Carathéodory theorem is complete. Let (, d) be a metric space. For, F we define dist (, F ) = inf d(x, y), x,y F diam = sup d(x, y). x,y Definition. An outer measure µ defined on subsets of a metric space is called metric outer measure if µ ( F ) = µ () + µ (F ) whenever dist (, F ) > 0. 7

Theorem 6 If µ is a metric outer measure, then all Borel sets are µ -measurable i.e. B() M. Proof. Since M is a σ-algebra, it suffices to show that all open sets belong to M. Let G be open. It suffices to show that for every A µ (A) µ (A G) + µ (A \ G) (4) (because the opposite inequality is obvious). otherwise (4) is obviously satisfied. For each positive integer n define We can assume that µ (A) < as G n = {x G : dist (x, \ G) > 1 n }. Then dist (G n, \ G) 1 n > 0. (5) Let Clearly and D n = G n+1 \ G n = {x G : G \ G n = dist (D i, D j ) 1 i + 1 1 j 1 n + 1 < dist (x, \ G) 1 n }. D i (6) i=n > 0 provided i + 2 j. Since the mutual distances between the sets D 1, D 3, D 5,..., D 2n 1 are positive we have µ (A D 1 ) + µ (A D 3 ) +... µ (A D 2n 1 ) = µ (A (D 1 D 3... D 2n 1 )) µ (A) and similarly µ (A D 2 ) + µ (A D 4 ) +... + µ (A D 2n ) µ (A). Hence Now (6) yields and hence µ (A D i ) 2µ (A) <. µ (A (G \ G n )) µ (A D i ) i=n µ (A (G \ G n )) 0 as n. 8

Inequality (5) gives µ (A G n ) + µ (A \ G) = µ ((A G n ) (A \ G)) µ (A) and thus µ (A G) + µ (A \ G) µ (A G n ) + µ (A (G \ G n )) + µ (A \ G) µ (A) + µ (A (G \ G n )) and after passing to the limit µ (A G) + µ (A \ G) µ (A). The proof is complete. We will use outer measures to prove the following important result. Theorem 7 Let be a metric space and µ a measure in B(). Suppose that is a union of countably many open sets of finite measure. Then µ() = inf µ(u) = U U open sup µ(c). (7) C C closed Before proving the theorem let s discuss two of its corollaries. The first one shows that in a situation described by the above theorem in order to prove that two measures are equal it suffices to compare them on the class of open sets. Corollary 8 If µ is as in Theorem 7 and ν is another measure on B() that satisfies ν(u) = µ(u) for all open sets U then ν() = µ() for all B(). The second corollary describes an important class of measures satisfying assumptions of theorem 7. Definition. Let be a metric space and µ a measure defined on the σ-algebra of Borel sets. We say that µ is a Radon measure if µ(k) < for all compact sets and µ() = inf µ(u) = U U open sup µ(k) for all B(). (8) K K compact Corollary 9 If is a locally compact and separable metric space 1 and µ is a measure in B() such that µ(k) < for every compact set K, then is a union of countably many open sets of finite measure and µ is a Radon measure. 1 Being locally compact means that ever point has a neighborhood whose closure is compact. IR n is locally compact, but also IR n \ {0} is locally compact. 9

Proof. It follows from locally compactness of that is a union of a family of open sets with compact closures. Since is separable, every covering of by open sets has countable subcovering and hence we can write = U i, U i compact. This proves the first part of the corollary because µ(u i ) µ(u i ) <. Now (8) follows from (7) and Theorem 3(f) because ( ) n C = C U i. n=1 } {{ } compact and hence µ(c) = for every closed set C. The proof is complete Proof of Theorem 7. For we set sup µ(k) K C K compact µ () = inf µ(u). U U open It is easy to see that µ is a metric outer measure. Hence µ restricted to the class of Borel sets is a measure. Clearly and µ(u) = µ (U) for all open sets U, (9) µ() µ () for all B(). (10) We can assume that is a union of an increasing sequence of open sets with finite measure. Indeed, if is the union of open sets U i of finite measure, then the sets V n = U 1 U 2... U n satisfy = V n, V n V n+1, µ(v n ) <. n=1 Now inequality (10) implies that for all B() µ(v n \ ) µ (V n \ ) and µ(v n ) µ (V n ), (11) because both V n \ and V n are Borel. Actually we have equalities in both inequalities of (11) because otherwise we would have a sharp inequality µ(v n ) = µ(v n \ ) + µ(v n ) < µ (V n \ ) + µ (V n ) = µ (V n ), 10

which contradicts (9). Hence in particular µ(v n ) = µ (V n ). Since µ(v n ) µ() and µ (V n ) µ () as n by Theorem 3(f) we conclude that µ() = µ () and thus µ() = inf µ(u) (12) U U open by the definition of µ. To prove that the measure of a set B() can be approximated by measures of closed subsets of observe that V n \ has finite measure and hence it follows from (12) that there is an open set G n such that V n \ G n, µ(g n \ (V n \ )) < ε 2 n. The set G = n=1 G n is open and C = \ G is closed. Now it suffices to observe that \ C = G n G n \ (V n \ ) n=1 and hence µ( \ C) < ε. The proof is complete. n=1 1.5 Hausdorff measure. Let ω s = π s/2 /Γ(1 + s 2 ), s 0. If s = n is a positive integer, then ω n is volume of the unit ball in IR n. Let be a metric space. For ε > 0 and we define H s ε() = inf ω s 2 s (diam A i ) s where the infimum is taken over all possible coverings A i with diam A i < ε. Since the function ε H s ε() is nonincreasing, the limit exists. H s is called Hausdorff measure. H s () = lim ε 0 H s ε() It is easy to see that if s = 0, H 0 is the counting measure. Theorem 10 H s is a metric outer measure. 11

Proof. Clearly H s : 2 [0, ], H s ( ) = 0, A B = H s (A) H s (B). In order to show that H s is an outer measure it remains to show that ( ) H s n n=1 H s ( n ). n=1 We can assume that the right hand side is finite. Hence Hε( s n ) < for every ε > 0. n=1 Fix δ > 0. We can find a covering of each set n such that Hence n=1 H s ε( n ) ω s s s n A ni, H s ε( n ) ω s 2 s diam A ni < ε (diam A ni ) s δ 2 n ( ) (diam A ni ) s δ Hε s n δ, i,n=1 because {A ni } i,n=1 forms a covering of n=1 n. Passing to the limit with δ 0 yields ( ) H s ( n ) Hε( s n ) Hε s n. n=1 n=1 Now passing to the limit with ε 0 yields the result. We are left with the verification of the metric condition. Let dist (, F ) > 0. It suffices to show that for all ε < dist (, F ). Let n=1 n=1 H s ε( F ) = H s ε() + H s ε(f ) (13) F A i, be a covering of F. We can assume that diam A i < ε A i ( F ) (14) 12

for all i, as otherwise we could remove A i from the covering. Since diam A i < dist (, F ) it follows from (14) that A i has a nonempty intersection with exactly one set or F. Accordingly, the family {A i } i splits into two disjoint subfamilies {B j } j = all the sets A i such that A i ; Hence ω s 2 s {C j } j = all the sets A i such that A i F. (diam A i ) s = ω s (diam B 2 s j ) s + ω s (diam C 2 s j ) s j j H s ε() + H s ε(f ). Since {A i } i was an arbitrary covering of F such that diam A i < ε, we conclude upon taking the infimum that H s ε( F ) H s ε() + H s ε(f ). Since the opposite inequality is obvious (13) follows. xercise. Show that if H s () <, then H t () = 0 for all t > s; H s () > 0, then H t () = for all 0 < t < s. Definition. The Hausdorff dimension is defined as follows. If H s () > 0 for all s 0, then dim H () =. Otherwise we define dim H () = inf{s 0 : H s () = 0}. It follows from the exercise that there is s [0, ] such that H t () = 0 for t > s and H t () = for 0 < t < s. Hausdorff dimension of equals s. 1.6 Lebesgue measure. For a closed interval we define its volume by and for A IR n we define P = [a 1, b 1 ]... [a n, b n ] IR n (15) P = (b 1 a 1 )... (b n a n ) L n(a) = inf P i 13

where the infimum is taken over all coverings A by intervals as in (15). Arguments similar to those in the proof that the Hausdorff measure is an outer metric measure give the following result Theorem 11 L n is a metric outer measure. Definition. L n is called outer Lebesgue measure. L n-measurable sets are called Lebesgue measurable. L n restricted to Lebesgue measurable sets is called Lebesgue measure and is denoted by L n. Corollary 12 All Borel sets are Lebesgue measurable. All sets with L n(a) = 0 are Lebesgue measurable. Then L n (A) = 0. A set has Lebesgue measure zero if and only it it can be covered by a sequence of intervals with whose sum of volumes can be arbitrarily small. Note that this is the same as the sets of measure zero in the setting of Riemann integral. Theorem 13 If P is a closed interval as in (15), then P i L n (P ) = P. In what follows we will often write A to denote the Lebesgue measure of a Lebesgue measurable set A. Proof. In order to prove the theorem we have to show that if P P i, then P P i. We will need the following quite obvious special case of this fact. Lemma 14 If P k P i is a finite covering of a closed interval P by closed intervals P i, then P k P i. Proof. This lemma follows from the theory of Riemann integral. 2 Indeed, volume of the interval equals the integral of the characteristic function. Hence inequality χ P k χ P i implies k k P = χ P χ Pi = P i. IR n IR n 2 This lemma can be proved in a more elementary way without using the Riemann integral, but the proof would be longer. 14

The proof is complete. Now we can complete the proof of the theorem. Observe that each interval P i is contained in a slightly bigger open interval Pi ε such that P ε i = P i + ε 2 i, where Pi ε is the closure of Pi ε. Now from the covering P P i ε finite subcovering P (compactness) and hence the lemma yields k j=1 P ε i j we can select a P k j=1 P ε i j P ε i = P i + ε. Since ε > 0 was arbitrary the theorem follows. Proposition 15 If P = (a 1, b 1 )... (a n, b n ) is a bounded interval, then L n ( P ) = 0. L n (P ) = (b 1 a 1 )... (b n a n ). If P is an unbounded interval in IR n and has nonempty interior (i.e. each side has positive length), then L n (P ) =. Proof. The first claim follows from the observation that P can be approximated from inside and from outside by closed intervals that differs arbitrarily small in volume. The second claim follows from the first one L n ( P ) = L n (P ) L n (int P ) = 0, and the lats claim follows from the fact that P contains closed intervals of arbitrarily large measure. IR k IR n, k < n is a subset generated by the first k coordinates. Corollary 16 If k < n, then L n (IR k ) = 0. The next result shows that we can compute Lebesgue measure of an open set by representing it as a union of cubes. We call a cube dyadic if its sidelength equals 2 k for some k Z. 15

Theorem 17 An arbitrary open set in IR n is a union of closed dyadic cubes with pairwise disjoint interiors. Hence Lebesgue measure of the open set equals the sum of the measures of these cubes. Proof. Instead of a formal proof we will show a picture which explains how to represent any open set as a union of cubes with sidelength 2 k, k = 0, 1, 2, 3,... The proof is complete Compare the following result with Theorem 7. Theorem 18 For an arbitrary set IR n L n() = inf L n (U), where the infimum is taken over all open sets U such that U. Proof. L n = inf i P i and each interval P i is contained in an open set (interval) of slightly bigger measure. very Borel set B is Lebesgue measurable. very set with L () = 0 is Lebesgue measurable. Hence sets of the form B are Lebesgue measurable. These are all Lebesgue measurable sets. More precisely we have the following characterization. 16

Definition. Let be a metric space. By a G δ set we mean a set of the form A = G i, where the sets G i are open and by F σ we mean a set of the form B = F i, where the sets F i are closed. Clearly all G δ and F σ sets are Borel. Theorem 19 Let A IR n. Then the following conditions are equivalent 1. A is Lebesgue measurable; 2. For every ε > 0 there is an open set G such that A G and L n(g \ A) < ε; 3. There is a G δ set H such that A H and L n(h \ A) = 0; 4. For every ε > 0 there is a closed set F such that F A and L n(a \ F ) < ε; 5. There is a F σ set M such that M A and L n(a \ M) = 0; 6. For every ε > 0 there is an open set G and a closed set F such that F A G and L n (G \ F ) < ε. Proof. (1) (2) very measurable set can be represented as a union of sets with finite measure A = A i, L n (A i ) <. It follows from Theorem 18 that for every i there is an open set G i such that A i G i and L n (G i \ A i ) < ε/2 i. Hence A G = G i, L n (G \ A) < ε. (2) (3) We define H = U i, where U i are open sets such that A U i, L n(u i \A) < 1/i. (3) (1) This implication is obvious. (1) (4) (5) this equivalence follows from the equivalence of the conditions (1), (2) and (3) applied to the set IR n \ A. (1) (6) If A is Lebesgue measurable then the existence of the sets F and G follows from the conditions (2) and (4). (6) (1) Take closed and open sets F i, G i such that F i A G i, L n (G i \ F i ) < 1/i. Then the set H = G i is G δ and L n(h \ A) = 0. The Lebesgue measure has an important property of being invariant under translations i.e. if A is a translation of a Lebesgue measurable set B, then A is Lebesgue measurable and L n (A) = L n (B). This property follows immediately from the definition of the Lebesgue measure. We will show not that this property implies that there is a set which is not Lebesgue measurable. Theorem 20 (Vitali) There is a set IR n which is not Lebesgue measurable. 3 3 The reader will easily check that the same proof applies to any translation invariant measure µ such that the measure of the unit cube if positive and finite; see also Theorem 21. 17

Proof. We will prove the theorem for n = 1 only, but the same argument works for an arbitrary n. The idea of the proof is to find a countable family of pairwise disjoint and isometric (by translation)sets whose union contains (0, 1) and is contained in ( 1, 2). If the sets in the family are isometric to a set V, then V cannot be Lebesgue measurable, because measurability of V would imply that a number between 1 and 3 (measure of ) be equal to infinite sum of equal numbers (equal to measure of V ) which is impossible. It should not be surprising that the construction of the set V has to involve axiom of choice. For x, y (0, 1) we write x y if x y is a rational number. Clearly is an equivalence relation and hence (0, 1) is the union of a family F of pairwise disjoint sets [x] = {y (0, 1) : x y}. It follows from the axiom of choice that there is a set V (0, 1) which contains exactly one element from each set in the family F. Let = V a, where V a = V + a = {x + a : x V }. It is easy to see that a Q ( 1,1) V a V b = if a, b Q, a b; (0, 1) ( 1, 2). Suppose that V is measurable. Then each V a is measurable, L 1 (V ) = L 1 (V a ) and is measurable. If L 1 (V ) > 0, then L 1 () = which is impossible and if L 1 (V ) = 0, then L 1 () = 0 which is also impossible. That proves that the set V cannot be measurable. The Lebesgue measure is translation invariant and L n ([0, 1] n ) = 1. It turns out that the above two properties uniquely determine Lebesgue measure. This proves that the Lebesgue measure is in a sense the only natural way of measuring length, area, volume of general sets in one, two, three,... dimensional uclidean spaces. Theorem 21 If µ is a measure on B(IR n ) such that µ(a + ) = µ() for all a IR n, B(IR n ) and µ([0, 1] n ) = 1, then µ() = L n () on B(IR n ). Proof. According to Corollary 8 it suffices to prove that µ(u) = L n (U) for all open sets U. Consider an open cube Q k = (0, 2 k ) n, where k is a positive integer. There are 2 kn pairwise disjoint open cubes contained in the unit cube [0, 1] n, each being a translation of Q k. Since the measure is invariant under translations we conclude that µ(q k ) 2 kn. 18

It is easy to see that we can cover the boundary Q by translations of Q k in a way that the sum of µ-measures of the cubes covering Q goes to zero as k. This proves that µ( Q) = 0. Now [0, 1] n can be covered by 2 kn cubes each being a translation of the closed cube Q k. Since the measure is invariant under translations we conclude that and hence µ(q k ) 2 kn µ(q k ) = µ(q k ) = 2 kn = L n (Q k ). (16) As we know (Theorem 17) an arbitrary open set U is a union of cubes with pairwise disjoint interiors and sidelength 2 k (k depends on the cube). Cubes are not disjoint, but meet along the boundary which has µ-measure zero. Hence the µ(u) equals the sum of the µ-measures of the cubes which according to (16) equals L n (U). The proof is complete. Theorem 22 H n = L n on B(IR n ). Sketch of the proof. Observe that the Hausdorff measure H n in IR n has the property that H n ( + a) = H n () for all B(IR n ). Therefore in order to prove that H n = L n on B(IR n ) it suffices to prove that H n ([0, 1] n ) = 1 This innocently looking result is however very difficult and we will not prove it here. xercise. Using Theorem 22 prove that H n = L n on the class of all subsets of IR n. If s < n, then H s is the s-dimensional measure in IR n. Not that s is not required to be an integer. If k < n is an integer and M IR n is a k-dimensional manifold, then one can prove that H k coincides on M with the natural k-dimensional Lebesgue measure on M. 4 xamples. If C [0, 1] is the standard ternary Cantor set, then one can prove that dim H (C) = log 2/ log 3 = 0.6309... Moreover H s (C) = 1 for s = log 2/ log 3. The Van Koch curve is defined as the limiting curve for the following construction 4 We have already seen how in the course of Advanced Calculus to measure subsets of M in the setting of the Riemann integral. This construction can easily be generalized to the Lebesgue measure on M. We will discuss it with more details later. 19

The Van Koch curve, denoted by K is homeomorphic to the circle S 1, but dim H K = log 4/ log 3 = 1, 261... Proposition 23 A set IR n has Lebesgue measure zero if and only if for every ε > there is a family of balls B(x i, r i )} such that B(x i, r i ), ri n < ε. Proof. ach bass B(x i, r i ) is contained in a cube with sidelength 2r i and hence can be covered by a sequence of cubes whose sum of volumes is (2r i) n < 2 n ε. If a set IR n has Lebesgue measure zero, then for every ε > 0 there is an open set U such that U and L n (U) < ε. The set U can be represented as union of cubes with pairwise disjoint interiors (Theorem 17). each cube Q of sidelength l is contained in a ball of radius r = nl and hence r n = n n/2 Q. Therefore can be covered by a sequence of balls {B(x i, r i )} such that rn i < n n/2 ε. Definition. We say that a function between metric spaces f : (, d) (Y, ρ) is Lipschitz continuous if there is a constant L > 0 such that ρ(f(x), f(y)) Ld(x, y) for all x, y. In this case we say that f is L-Lipschitz continuous and call L Lipschitz constant if f. Proposition 24 If f : IR n IR n is Lipschitz continuous and IR n has Lebesgue measure zero, then f() has Lebesgue measure zero too. Proof. It follows from Proposition 23 and the fact that the L-Lipschitz image of a ball of radius r is contained in a ball of radius Lr. xercise. Let f : IR n IR n be a homeomorphism. Prove that f(a) is Borel if and only if A is Borel. Although the theory of Lebesgue measure seems to work very well something very important is missing: we haven t proved so far that if Q is a unit cube whose sides are not parallel to coordinate axes, then its measure equals 1. Indeed we considered only intervals with sides parallel to coordinate directions. Fortunately this important fact is contained in a result that we will prove now. 20

Theorem 25 If L : IR n IR n is a nondegenerate linear transformation with the matrix A (det A 0), then L() IR n is a Borel (Lebesgue measurable) set if and only if IR n is Borel (Lebesgue measurable) set. Moreover L n (L()) = det A L n (). (17) Proof. Since L is a homeomorphism it preserves the class of Borel sets (see xercise). Since both L and L 1 are Lipschitz continuous L preserves the class of sets of measure zero (Proposition 24). Therefore L preserves the class of Lebesgue measurable sets (because each Lebesgue measurable set is the union of a Borel set and a set of measure zero). It follows from Proposition 24 that the formula (17) is satisfied for sets of measure zero. Therefore it remains to prove (24) for Borel sets. Define a new measure µ() = L n (L()). Clearly µ is a measure on B(IR n ) that is translation invariant. Let µ([0, 1] n ) = a > 0. Hence the measure a 1 µ satisfies all the assumptions of Theorem 21 and therefore a 1 µ() = L n (), so we have L n (L()) = al n () for all B(IR n ). It remains to prove that a = det A. Clearly a = a(a) : GL (n) (0, ) is a function defined on the class of invertible matrices. We have a(a 1 A 2 ) = a(a 1 )a(a 2 ). (18) Indeed, if A 1, A 2 are matrices representing linear transformations L 1 and L 2, then Also a(a 1 A 2 )L n () = L n (L 1 (L 2 ()) = a(l 1 )L n (L 2 ()) = a(l 1 )a(l 2 )L n (). a(s n I) = s n, s > 0. (19) Here I is the identity matrix. This property is quite obvious and we leave a formal proof to the reader. Now it remains to prove the following fact. Lemma 26 Let a : GL (n) (0, ) be a function with properties (18) and (19). Then a(a) = det A. Proof. Let A i (s) be the diagonal matrix with s on the ith place and s on all other places on the diagonal. Since A i (s) 2 = s 2 I we have a(a i (s)) 2 = a(a i (s) 2 ) = s 2n and hance a(a i (s)) = s n = det A i (s). For k l let B kl (s) = [a ij ] be the matrix such that a kl = s, a ii = 1, i = 1, 2,... and all other entries equal zero. Multiplication by the matrix B kl (s) from the right (left) is equivalent to adding kth column (lth row) multiplied by s to lth column (kth row). It is well known from linear algebra (and easy to prove) that applying such operations to any nonsingular matrix A it can be transformed to a matrix of the form ti or 21

A n (t). Since multiplication by B kl (s) does not change determinant, t = det A 1/n. It remains to prove that a(b kl (s)) = 1. Since B kl ( s) = A k (1)B kl (s)a k (1) we have that a(b kl ( s)) = a(b kl (s)). On the other hand B kl (s)b kl ( s) = I and hence a(b kl (s)) 2 = 1, so a(b kl (s)) = 1. The proof of the lemma and hence the proof of the theorem is complete. 2 Integration 2.1 Measurable functions. Definition. Let (, M) be a measurable space and Y a metric space. We say that a function f : Y is measurable if f 1 (U) M for every open set U Y. If M is a measurable set, then we say that f : Y is measurable if f 1 (U) M for every open U Y. Clearly, f : Y is measurable if it can be extended to a measurable function f : Y, but there is also another point of view. M = {A : A M} = {B : B M} is a σ-algebra and measurability of f : Y is equivalent to measurability of f with respect to M. Theorem 27 Let (, M) be a measurable space, f : Y a measurable function and g : Y Z a continuous function. Then the function g f : Z is measurable. Proof. For every open set U Z we have (g f) 1 (U) = f 1 (g 1 (U)) M. }{{} open in Y The proof is complete. Theorem 28 Let (, M) be a measurable space and Y a metric space. If the functions u, v : IR are measurable and Φ : IR 2 Y is continuous, then the function is measurable. h(x) = Φ(u(x), v(x)) : Y 22

Proof. Put f(x) = (u(x), v(x)), f : IR 2. According to Theorem 27 it suffices to prove that f is measurable. Let R = (a, b) (c, d) be an open rectangle. Then f 1 (R) = f 1 ((a, b) (c, d)) = u 1 ((a, b)) u 1 ((c, d)) M. very open set U IR 2 is a countable union of open rectangles U = R i and hence f 1 (U) = f 1 (R i ) M. The proof is complete. This theorem has many applications. (a) If f = u + iv : C, where u, v : IR are measurable, then f is measurable. Indeed, we take Φ(u, v) = u + iv. (b) If f = u + iv : C is measurable, then the functions u, v, f : IR are measurable. Indeed, we take Φ 1 (u + iv) = u, Φ 2 (u + iv) = v, Φ 3 (u + iv) = u + iv. (c) If f, g : C are measurable, then the functions f +g, fg : C are measurable. Indeed, we take Φ 1 : C C C, Φ 1 (z 1, z 2 ) = z 1 + z 2 and Φ 2 : C C C, Φ 2 (z 1, z 2 ) = z 1 z 2. (d) A set is measurable if and only if the characteristic function if { 1 if x, χ (x) = 0 if x, is measurable. Indeed, this follows from the fact that for any open set U IR if 0 U, 1 U, (χ ) 1 if 0 U, 1 U, (U) = \ if 0 U, 1 U, if 0 U, 1 U, 23

(e) If f : C is a measurable function, then there is a measurable function α : C such that α = 1 and f = α f. The set = {x : f(x) = 0} is measurable because = f 1 ( B(0, 1 i ) }{{} ). disc of radius 1/i Hence the function h(x) = f(x)+χ (x) is measurable and h(x) 0 for all x. Therefore h is a measurable function as a function h : C \ {0}. Since ϕ : C \ {0} C, ϕ(z) = z/ z is continuous we conclude that α(x) = ϕ(f(x) + χ (x)) is measurable. If x, then α(x) = 1 and if x, then α(x) = f(x)/ f(x). Therefore f(x) = α(x) f(x) for all x. Definition. Let, Y be two metric spaces. We say that f : Y is a Borel mapping if it is measurable with respect to the σ-algebra of Borel sets in i.e. f 1 (U) B() for every open set U Y. Clearly every continuous mapping is Borel. Theorem 29 Let (, M) be a measurable space and Y a metric space. Let also f : Y be a measurable mapping. (a) If Y is a Borel set, then f 1 () M. (b) If Z is another metric space and g : Y Z is a Borel mapping, then g f : Z is measurable. Proof. (a) Let R = {A Y f 1 (A) M}. Clearly R contains all open sets. If we prove that R is a σ-algebra, we will have that B(Y ) R which is the claim. Y R Indeed, f 1 (Y ) = M. If A R, then Y \ A R. Indeed, f 1 (Y \ A) = \ f 1 (A) M. 24

If A 1, A 2,... R, then A i R. Indeed, f 1 ( A i) = f 1 (A i ) R. The above three properties imply that R is a σ-algebra. (b) If U Z is open, then g 1 (U) Y is Borel and hence (g f) 1 (U) = f 1 (g 1 (U)) M by (a). Definition. Let (, M) be a measurable space. Let IR = IR {, }. We say that a function f : IR is measurable if f 1 (U) is measurable for every open set U IR and if the sets f 1 (+ ), f 1 ( ) are measurable. Similarly for M we define measurable functions f : IR. xercise. Let (, M) be a measurable space. Prove that a function f : IR is measurable if and only if f 1 ((a, ]) M for all a IR. and Let {a n } be a sequence in IR. For k = 1, 2,... put We call β upper limit of {a n } and write b k = sup{a k, a k+1, a k+2,...} (20) β = inf{b 1, b 2,...}. (21) β = lim sup a n n Clearly {b k } is a decreasing sequence, so β = lim k b k. It is easy to see that there is a subsequence {a ni } of {a n } such that lim i a ni = β. Moreover β is the largest possible value for limits of all subsequences of {a n }. The lower limit is defined by lim inf n a n = lim sup( a n ). (22) n quivalently the lower limit can be defined by interchanging sup and inf in (20) and (21). The lower limit equals to the lowest possible value for limits of subsequences of {a n }. For a sequence of functions f n : IR we define new functions sup n f n, lim sup n f n : IR by ( ( sup n lim sup n ) f n (x) = sup f n (x) for every x, n ) f n (x) = lim sup f n (x) for every x. n 25

Similarly we can define functions inf n f n and lim inf n f n. If f(x) = lim n f n (x) for every x, then we say that f is a pointwise limit of {f n }. Theorem 30 Suppose that f n : IR is a sequence of measurable functions. Then the functions sup n f n, inf n f n, lim sup n f n, lim inf n f n are measurable. Proof. Since ( ) 1 sup f n ((a, ]) = n n=1 f 1 n ((a, ]) M, measurability of sup n f n follows from the exercise. By a similar argument we prove that the function inf n f n is measurable. The two facts imply measurability of lim sup f n because lim sup f n = inf {sup f i }. n k 1 Now measurability of lim inf n f n is an obvious consequence of the equality lim inf n i k f n = lim sup( f n ). n The proof is complete. Corollary 31 The limit of every pointwise convergent sequence of functions f n : C (or f n : IR) is measurable. Corollary 32 If f, g : IR are measurable functions, then the functions max{f, g}, min{f, g} are measurable. In particular the fuctions are measurable. f + = max{f, 0}, f = min{f, 0} The functions f + and f are called positive part and negative part of f. Note that f = f + f, f = f + + f. Corollary 31 has the following generalization to the case of metric space valued mappings. Theorem 33 Let f n : Y be a sequence of measurable mappings with values in a separable metric space. If f n (x) f(x) for every x, then f : Y is measurable. 26

Proof. Let {y i } be a countable dense set. Then the functions d i : Y IR, d i (y) = d(y i, y) are continuous. Hence d i f n : IR are measurable. Since d i f n d i f, the functions d i f : IR are measurable too by Corollary 31. In particular the sets {x : d(y i, f(x)) d(y i, F )} = (d i f) 1 ([d(y i, F ), )) are measurable. To prove measurability of f it suffices to prove measurability of f 1 (F ) for every closed set F and it follows from the equality f 1 (F ) = {x : d(y i, f(x)) d(y i, F )}. (23) To prove this equality observe that if x f 1 (F ), then f(x) F and hence d(y i, f(x)) d(y i, F ) for every i, so x belongs to the right hand side of (23). On the other hand if x f 1 (F ), then f(x) F and hence there is i such that d(y i, f(x)) < d(y i, F ) (because {y i } is a dense subset of Y and so we can find y i arbitrarily close to f(x)), therefore x cannot belong to the right hand side of (23). Definition. Let (, M) be a measurable space. A measurable function s : C is called simple function if it has only a finite number of values. Simple functions can be represented in the form n s = α i χ Ai, where α i α j for i j and A i are disjoint measurable sets. Theorem 34 If f : [0, ] is measurable, then there is a sequence of simple functions s n on such that 1. 0 s 1 s 2... f; 2. lim n s n (x) = f(x) for every x. If in addition f is bounded, then s n converges to f uniformly. Proof. It is easy to see that the sequence has all properties that we need. { n if f(x) n, s n (x) = k k if f(x) < k+1 n, 2 k 2 n 2 n 27

2.2 Integral In the integration theory we will assume that 0 = 0. Definition. If s : [0, ) is a simple function of the form s = n α i χ Ai, where α i α j for i j and A i are pairwise disjoint measurable sets, then for any M we define n s dµ = α i µ(a i ). According to our assumption 0 = 0, so 0 µ(a i ) = 0, even if µ(a i ) =. If f : [0, ] is measurable, and M, we define the Lebesgue integral of f over by f dµ = sup s dµ, where the supremum is over all simple functions s such that 0 s f. (a) If 0 f g, then f dµ g dµ. (b) If A B and f 0, then f dµ f dµ. A B (c) If f 0 and c is a constant 0 c <, then f dµ = c f dµ. (d) If f(x) = 0 for all x, then f dµ = 0, even if µ() =. (e) If µ() = 0, then f dµ = 0, even if f(x) = for all x. (f) If f 0, then f dµ = χ f dµ. In the proof of the next two theorems we will need the following lemma. Lemma 35 Let s and t be nonnegative measurable simple functions on. Then the function ϕ() = s dµ, M defines a measure in M. Moreover (s + t) dµ = s dµ + t dµ. (24) 28

Proof. Since ϕ( ) = 0, it remains to prove that ϕ is countably additive. If 1, 2,... are disjoint measurable sets, then ( ) ( n ) n ϕ k = α i µ k A i = µ( k A i ) k=1 = k=1 n α i µ( k A i ) = k=1 k=1 This completes the proof of the first part. Let now α i k=1 ϕ( k ). s = k α i χ Ai, t = m β j χ Bj j=1 be a representation of the simple functions as in the definition, then for ij = A i B j we have s + t dµ = (α i + β j )µ( ij ) = ij s dµ + ij t dµ. ij Since is a union of disjoint sets ij we conclude (24) from the first part of the lemma. The following theorem is one of the most important results of the theory of integration. Theorem 36 (Lebesgue monotone convergence theorem) Let {f n } be a sequence of measurable functions on such that: 1. 0 f 1 f 2... for every x ; 2. lim n f n (x) = f(x) for every x. Then f is measurable and lim f n dµ = f dµ n Proof. Measurability of f follows from Corollary 31. The sequence f n dµ is increasing and hence it has a limit α = lim n f n dµ. Since f n f, we have f n dµ f dµ and hence α f dµ. Let 0 s f be a simple function. Fix a constant 0 < c < 1 and define n = {x : f n (x) cs(x)}, n = 1, 2, 3,... It is easy to see that 1 2... and n=1 n = (because f n (x) f(x) for every x.) Hence f n dµ f n dµ c n s dµ. n (25) 29

Since ϕ() = s dµ is a measure and 1 2... we conclude (Theorem 3(f)) that ( ) s dµ = ϕ( n ) ϕ n = ϕ() = s dµ. n Thus after passing to the limit in (25) we obtain α c s dµ and hence f dµ α s dµ, n=1 because 0 < c < 1 was arbitrary. Taking supremum over all simple functions 0 s f yields f dµ α f dµ, i.e. f dµ = α and the theorem follows. Theorem 37 If f n : [0, ] is measurable, for n = 1, 2, 3,... and then f is measurable and f(x) = f n (x) for all x, n=1 f dµ = f n dµ. Proof. The function f is a limit of measurable functions, so it is measurable. Let {s i } and {t i } be increasing sequences of simple nonnegative function such that s i f 1, t i f 2 (Theorem 34). Then s i + t i f 1 + f 2 and hence (24) combined with the monotone convergence theorem gives f 1 + f 2 dµ = f 1 dµ + f 2 dµ. Applying an induction argument we obtain (f 1 + f 2 +... + f N ) dµ = N n=1 f n dµ. Since g N = f 1 +... + f N is an increasing sequence of measurable functions and g N f the theorem follows from the monotone convergence theorem. Theorem 38 (Fatou s lemma) If f n : [0, ] is measurable, for n = 1, 2, 3,..., then ( ) lim inf f n dµ lim inf f n dµ. n n 30

xercise. Construct a sequence of measurable functions f n : IR [0, ] such that the inequality in Fatou s lemma is sharp. Proof of the theorem. Let g n = inf{f n, f n+1, f n+2,...}. Then g n f n and hence g n dµ f n dµ. (26) Since 0 g 1 g 2..., all the functions g n are measurable (Theorem 30) and g n lim inf k f n, inequality (26) combined with the monotone convergence theorem yields the result. Theorem 39 Let f : [0, ] be measurable. Then ϕ() = f dµ for M defines a measure in M. Moreover g dϕ = for every measurable function g : [0, ]. gf dµ (27) Proof. Clearly ϕ( ) = 0. To prove that ϕ is a measure it remains to prove countable additivity. Let 1, 2,... M be disjoint sets and = n=1 n. Then obviously χ f = χ n f n=1 and ϕ() = = χ f dµ = χ n f dµ n=1 χ n f dµ = n=1 n=1 ϕ( n ), by Theorem 37. To prove (27) observe that it is true if g = χ, M. Hence it is also true for simple functions and the general case follows from the monotone convergence theorem combined with Theorem 34. Definition. Let (, M, µ) be a measure space. The space L 1 (µ) consists of all measurable functions f : C for which f 1 = f dµ <. lements of L1 (µ) are called Lebesgue integrable functions or summable functions. 31

If f = u + iv, then the functions u +, u, v +, v are measurable and Lebesgue integrable as bounded by f. For a set M we define ( ) f dµ = u + dµ u dµ + i v + dµ v dµ. If f : IR is measurable and each of the integrals f + dµ, f dµ is finite, then we also define f dµ = f + dµ f dµ. Theorem 40 Suppose that f, g L 1 (µ). If α, β C, then αf + βg L 1 (µ) and (αf + βg) dµ = α f dµ + β g dµ. We leave the proof as a simple exercise. This theorem says that L 1 (µ) is a linear space and f f dµ is a linear functional in L 1 (µ). Similarly we define L 1 (µ, IR n ) as the class of all measurable mappings f = (f 1, f 2,..., f n ) : IR n such that f dµ < and then we set ( ) f dµ = f 1 dµ,..., f n dµ. Theorem 41 If f L 1 (µ, IR n ), then f dµ f dµ. Proof. Let α = (α 1,..., α n ) IR n be a unit vector such that f dµ = α, f dµ. Then n f dµ = α, f dµ = α i f i dµ by Schwarz inequality. = n α i f i dµ = α, f dµ f dµ Theorem 42 (Lebesgue s dominated convergence theorem) Suppose {f n } is a sequence of complex valued functions on such that the limit f(x) = lim n f n (x) exists for every x. If there is a function g L 1 (µ) such that f n (x) g(x) for every x and all n = 1, 2,..., 32

then f L 1 (µ), and lim f f n dµ = 0 n lim f n dµ = f dµ. n Proof. Clearly f is measurable and f g. Hence f L 1 (µ). It is easy to see that 2g f f n 0 and hence Fatou s lemma yields 2g dµ lim inf (2g f f n ) dµ n ( ) = 2g + lim inf f f n dµ n = 2g lim sup f f n dµ. n The integral 2g dµ is finite and hence lim sup f n f dµ 0. n This immediately implies Now so The proof is complete. f n dµ lim f n f dµ = 0. n lim n f dµ f n dµ = f n f dµ 0, f dµ. 2.3 Almost everywhere Definition. If there is a set N M of measure zero, µ(n) = 0 such that a property P (x) is satisfied for all x \ N, then we say that the property P (x) is satisfied almost everywhere (a.e.) For example f n f a.e. means that there is a set N M, µ(n) = 0 such that f n (x) f(x) for all x \ N. 33

If f = g a.e., then we write f g. It is easy to see that is an equivalence relation (f g, g h f h follows from the fact that the union of two sets of measure zero is a set of measure zero). Note that if f g, then f dµ = g dµ for every M. Therefore from the point of view of the integration theory, functions which are equal a.e. cannot be distinguished. Moreover if f dµ = g dµ for all M, then f = g a.e. This follows from part (b) of Theorem 43 below applied to f g. Thus two functions f and g cannot be distinguished in the integration theory if and only if they are equal a.e. Therefore it is natural to identify such functions. Theorem 43 (a) Suppose f : [0, ] is measurable, M and f dµ = 0, then f = 0 a.e. in. (b) Suppose f L 1 (µ) and f dµ = 0 for every M. Then f = 0 a.e. in. Proof. (a) Suppose A n = {x : f(x) 1/n}. Then 1 n µ(a n) f dµ f dµ = 0. A n Hence µ(a n ) = 0 and therefore the set {x : f(x) > 0} = n=1 A n has measure zero. (b) Let = u + iv. Define = {x : u(x) 0}. Then 0 = real part of f dµ = u dµ = u + dµ, and hence u + = 0 a.e. by (a). Similarly u, v +, v = 0 a.e. Accordingly, f = 0 a.e. Recall that a measure µ is complete if every subset of a set of measure zero is measurable (and hence has measure zero). All the measures obtained through the Carathéodory construction are complete (Proposition 4), e.g. the Hausforff measure and the Lebesgue measure are complete. However, we can assume that any measure is complete, because we can add all subsets of sets of measure zero to the σ-algebra. More precisely one can prove. Theorem 44 Let (, M, µ) be a measure space, let M be the collection of all for which there exist sets A, B M such that A B and µ(b \ A) = 0 and define µ() = µ(a) in this situation. Then M is a σ-algebra and µ is a complete measure on M. 34

For a proof, see Rudin page 28. Therefore, whenever needed, we can assume that the measure is complete. Remark. If µ is a measure on B(IR n ) and there is an uncountable set B(IR n ) such that µ() = 0, then µ cannot be complete, because one can prove that the cardinality of all Borel sets in B(IR n ) is the same as the cardinality of real numbers and the cardinality of all subsets of is the same as the cardinality of all subsets of real numbers which is bigger than the cardinality of real numbers. Hence in order to make the measure µ complete we have to enlarge B(IR n ) to a bigger σ-algebra by including also sets which are not Borel. This is the case of the Lebesgue measure. The σ-algebra of Lebesgue measurable sets is larger, than the σ-algebra of Borel sets. Definition. Suppose now that µ is a complete measure. If a function with values in a metric space Y is defined only at almost all points of, i.e. f : \ N Y, µ(n) = 0, then we say that f is measurable if there is y 0 Y such that f(x) = { f(x) if x \ N, y 0 if x N, is measurable. In such a situation we say that f : Y is a measurable function defined a.e. Note that it follows from the assumption that µ is complete that f can be defined in N in an arbitrary way and still the resulting function f is measurable and f f. In particular functions in L 1 (µ) need to be defined a.e. only. L 1 (µ) is a linear space. Since we identify functions that are equal a.e. we identify L 1 (µ) with the quotient space L 1 (µ)/ (check that L 1 (µ)/ is a linear space). Therefore f L 1 (µ) satisfies f = 0 in L 1 (µ) if and only if f = 0 a.e., if and only if f 1 = f dµ = 0 (see Theorem 43(a)). In all the previously discussed convergence theorems (Theorem 36, Theorem 37, Theorem 38, Theorem 42) we can replace the requirement of everywhere convergence by the a.e. convergence. For example the Lebesgue dominated convergence theorem can be formulated as follows. Theorem 45 (Lebesgue dominated convergence theorem) Suppose that {f n } is a sequence of complex valued functino on defined a.e. If the limit f(x) = lim n f n (x) exists a.e. and if there is a function g L 1 (µ) such that f n (x) g(x) a.e., for all n = 1, 2, 3,... then f L 1 (µ), lim f f n dµ = 0 n 35

and lim f n dµ = f dµ. n Not surprisingly, this theorem is an easy consequence of the previous version of the Lebesgue dominated convergence theorem. We leave details to the reader. Recall that if Y is a metric space, then a mapping f : IR n Y is Lebesgue measurable if f 1 (U) is a Lebesgue measurable set for every open U Y. Since the σ-algebra of Lebesgue measurable sets is larger, than that of Borel sets, the mapping f need not be Borel. I turns out, however, that f equals a.e. to a Borel mapping, at least when Y is separable. Theorem 46 If Y is a separable metric space and f : IR n Y is Lebesgue measurable, then there is a Borel mapping g : IR n Y such that f = g a.e. Proof. Since Y is separable, there is a countable family of open sets {U i } in Y such that every open set in Y is a union of open sets from the family. Indeed, it suffices to take the family of balls with rational length radii and centers in a countable dense set in Y. For i = 1, 2,... let M i IR n be a F σ set such that M i f 1 (U i ), L n (f 1 (U i ) \ M i ) = 0, see Theorem 19. Clearly the set = (f 1 (U i ) \ M i ) has Lebesgue measure zero and hence there is a G δ set H such that H, L n (H) = 0 (Theorem 19). Fix y 0 Y and define { f(x) if x H, g(x) = y 0 if x H. Clearly f = g a.e. and it remains to prove that the mapping g is Borel. It is easy to see that 5 if y 0 U i, then g 1 (U i ) = M i \ H and if y 0 U i, then g 1 (U i ) = M i H. In each case g 1 (U i ) is a Borel set. Since every open set U Y can be represented as U = j=1 U i j, the set g 1 (U) = g 1 (U ij ) is Borel. 5 A good picture is useful here. j=1 36

2.4 Theorems of Lusin and gorov Theorem 47 (Lusin) Let be a metric space and µ a measure in B() such that is a union of countably many open sets of finite measure. If f : Y is a Borel mapping with values in a separable metric space, then for every ε > 0 there is a closed set F such that µ( \ F ) < ε and f F is continuous. Proof. Let {U i } be a countable family of open sets such that every open set in Y is a union of open sets from the family. It follows from Theorem 7 that for every i there is an open set V i such that 6 Let f 1 (U i ) V i, µ(v i \ f 1 (U i )) < ε2 i 1. = V i \ f 1 (U i ). Clearly µ() < ε/2. We will prove that the mapping g = f \ is continuous. Observe that 7 g 1 (U i ) = V i ( \ ). (28) Indeed, g 1 (U i ) V i ( \ ), but on the other hand so V i ( \ ) V i ( \ (V i \ f 1 (U i ))) = f 1 (U i ), V i ( \ ) f 1 (U i ) ( \ ) = g 1 (U i ). In order to prove that g is continuous it suffices to prove that for every open set U Y there is an open set V such that 8 g 1 (U) = V ( \ ). Let U = j=1 U i j. Then (28) gives g 1 (U) = ( j=1 V i j ) ( \ ) which proves continuity of g in \. The set need not be closed, but it follows from Theorem 7 that there is an open set G such that G and µ(g \ ) < ε/2. Now f F is continuous, where F = \ G, F is closed and µ( \ F ) = µ(g) = µ() + µ(g \ ) < ε. As a corollary we will prove the following characterization of Lebesgue measurable functions also known as the Lusin theorem. Let IR n be a Lebesgue measurable set. Recall that f : IR is Lebesgue measurable if f 1 (U) is Lebesgue measurable for every open U IR (f 1 (U) is a subset of ). 6 Actually to prove this fact we need first to split f 1 (U i ) into sets of finite measure and then apply Theorem 7 to each such set. 7 It is easy to see if you make an appropriate picture, but a formal proof is enclosed below. 8 Because the class of open sets in the metric space \ is exactly the class of sets of the form V ( \ ), where V is open. 37

Theorem 48 (Lusin) Let IR n be Lebesgue measurable. Then a function f : IR is Lebesgue measurable if and only if for every ε > 0 there is a closed set F such that f F is continuous and L n ( \ F ) < ε. Proof. Since f equals a.e. to a Borel function necessity follows from Theorem 47. To prove sufficiency it suffices to show that for every a IR the set a = {x : f(x) a} is Lebesgue measurable (this is a variant of the exercise from page 25.) Let F be a closed set such that f F is continuous and L n ( \ F ) < ε. Then the set F = {x : (f F )(x) a} = a F is closed and L n( a \ F ) L n ( \ F ) < ε, hence measurability of a follows from Theorem 19. In the next theorem µ is a measure on an arbitrary set (not necessarily a metric space), but it is crucial that µ() <. Theorem 49 (gorov) Let µ be a finite measure on a set. Let f n : Y be a sequence of measurable functions with values in a separable metric space. If f n f a.e., then f is measurable and for every ε > 0 there is a measurable set such that µ( \ ) < ε and f n f uniformly on. Proof. Measurability of f follows from Theorem 33. Let d be the metric in Y. Define C i,j = {x : d(f n (x), f(x)) 2 i }. n=j We will prove that this set is measurable. First observe that the mapping F = (f n, f) : Y Y is measurable. Indeed, if {U i } is a collection of open sets in Y such that every open set in Y is a countable union of open sets from the family, it easily follows that every open set in Y Y is a countable union of open sets of the form U i U j. Therefore to prove measurability of F it suffices to observe that each set F 1 (U i U j ) = f 1 n (U i ) f 1 (U j ) is measurable. Since the function d : Y Y IR is continuous, the composed map d F : IR is measurable and hence {x : d(f n (x), f(x)) 2 i } = F 1 {(y 1, y 2 ) Y Y : d(y 1, y 2 ) 2 i } = F 1 (d 1 ([2 i, )) = (d F ) 1 ([2 i, )) is measurable. That proves measurability of the sets C i,j. Now observe that C i,j is a decreasing sequence of sets with respect to j, µ(c i,1 ) µ() < and hence µ(c i,j ) µ( C i,k ) = 0 as j (29) k=1 38

by Theorem 3(g). Indeed, this intersection has measure zero because if x k=1 C i,k, then there is a subsequence f nj (x) such that d(f nj (x), f(x)) 2 i for all j, which can be true only for x in a set of measure zero. Now it follows from (29) that for every i there is an integer J(i) such that µ(c i,j(i) ) < ε2 i. It is easy to see that µ( C i,j(i)) < ε and that f n converges uniformly to f on = \ C i,j(i), because for every i d(f n (x), f(x)) < 2 i for all n J(i) and all x. Again if is a metric space and µ is a finite measure of B() we can assume that the set in the above theorem is closed, see Theorem 7. xercise. Show an example that if µ() =, then the claim of gorov s theorem may be false. 2.5 Convergence in measure Definition. We say that a sequence of measurable functions f n : IR converges in measure to a measurable function f : IR if for every ε > 0 lim µ({x : f n(x) f(x) ε}) = 0. n We denote convergence in measure by f n f n and f are defined a.e. µ f. As usual we assume that the functions Theorem 50 (Lebesgue) If µ() < and a sequence of measurable functions f n : IR converges to f : IR a.e., then it converges in measure, f n µ f. Proof. Given ε > 0, let i = {x : f i (x) f(x) ε}. The sequence of sets i=n i is a decreasing sequence of sets of finite measure and hence ( ) ( ) µ i µ i = 0. i=n n=1 i=n The last set has measure zero, because if x i=n i for every n, then x belongs to infinitely many i s so there is a subsequence f ni such that f ni (x) f(x) ε which is true only on a set of measure zero. Hence µ({x : f n (x) f(x) ε}) = µ( n ) µ( i ) 0, i=n which proves convergence in measure. 39

Theorem 51 (Riesz) If f n µ f, then there is a subsequence fni such that f ni f a.e. Remark. In this theorem measure of the space can be infinite, µ() =. Proof. For every i there is n i such that µ({x : f ni (x) f(x) 1 i }) 1 2 i. We can assume that n 1 < n 2 < n 3... Let F k = \ {x : f ni (x) f(x) 1 i }. i=k Clearly µ( \ F k ) i=k 2 i = 2 k+1 and hence µ( \ k=1 F k) = 0. We will prove that f ni f pointwise on k=1 F k. To this end it suffices to show that f ni converges to f pointwise on every set F k, but we will actually show a stronger fact that f ni f uniformly on F k. Indeed, for all x F k, x does not belong to the set i=k {x : f n i (x) f(x) 1/i}, so for every j k and all x F k f nj (x) f(x) 1 j which proves uniform convergence of f nj to f on F k Remark. We actually proved not only convergence a.e. but also uniform convergence on subsets of whose complement has arbitrary small measure. Note that Lebesgue s theorem combined with this stronger conclusion of Riesz theorem implies gorov s theorem for sequences of real valued functions. xercise. Generalize the Lebesgue and the Riesz theorems to the case of sequences of measurable mappings with values into separable metric spaces. 2.6 Absolute continuity of the integral The following result is known as the absolute continuity of the integral. Theorem 52 Let f L 1 (µ). Then for every ε > 0 there is δ > 0 such that f dµ < ε whenever µ() < δ. Proof. For if not, there would exist ε 0 > 0 and sets n M such that µ( n ) 2 n and n f dµ ε 0. The sequence of sets A k = n=k n is decreasing and ( ) µ A k = 0, (30) k=1 40

because µ( k=1 A k) µ(a n ) i=n µ( i) 2 n+1 for every n. Since ϕ() = f dµ is a measure on M (Theorem 39) and ϕ(a 1) = A 1 f dµ f dµ <, Theorem 3(g) implies that ( ) ϕ(a k ) ϕ A k = f dµ = 0 k=1 k=1 A k by (30). This is, however, a clear contradiction, because ϕ(a k ) ϕ( k ) = k f dµ ε 0. 2.7 Riesz representation theorem Definition. If f : IR is a continuous function on a metric space, then supp f = {x : f(x) 0} is called the support of f and C c () = C c (, IR) denotes the linear space of continuous functions on with compact support. We say that a linear functional I : C c () IR is positive if I(f) 0 whenever f 0. Let be a locally compact metric space and µ a measure on B() such that µ() < for every compact set K, i.e. µ is a Radon measure, see Corollary 9. Then I(f) = f dµ is a positive linear functional on C c (). Surprisingly, every positive linear functional on C c () can be represented as an integral with respect to a Radon measure. Theorem 53 (Riesz representation theorem) Let be a locally compact metric space 9 and let I : C c () IR be a positive linear functional. Then there is a unique Radon measure µ such that I(f) = f dµ. (31) Proof. For an open set G denote by Γ(G) the class of continuous functions 0 f 1 with compact support in G. Lemma 54 Let G 1, G 2,..., G k be open sets. If f Γ(G 1... G k ), then there exist functions f i Γ(G i ), i = 1, 2,... such that f = f 1 +... + f k. 9 This theorem is true for a more general class of locally compact Hausdorff topological spaces. 41

Proof. Since is locally compact, there is a compact set such that supp f int, G 1... G k. The sets H i = G i form an open covering of the compact space. If we can find continuous nonnegative functions ϕ 1,..., ϕ k in such that then the functions ϕ = ϕ 1 +... + ϕ k > 0 in, supp ϕ i H i, f i (x) = { f(x)ϕi (x)/ϕ(x) if x, 0 if x will have desired properties. To construct functions ϕ i observe that there is an open covering of by sets D i, i = 1, 2,..., k such that D i H i and then we define ϕ i (x) = dist (, \ D i ). Let now I : C c () IR be a positive linear functional. Obviously (because I(g f) 0). I(0) = 0 and I(f) I(g) for f g xercise. Prove that if ψ : [0, ) [0, ) is an increasing function such that ψ(s + t) = ψ(s) + ψ(t), then ψ(t) = tψ(1). It follows from the exercise that Now for B() we define where I(tf) = ti(f) for all t 0 and f C c (). (32) µ() = λ(g) = inf λ(g), (33) G open sup I(f). f Γ(G) In order to prove that µ is a measure on B() it suffices to prove that for any open sets G 1, G 2,... λ( G n ) λ(g n ) (34) and n=1 n=1 λ(g 1 G 2 ) = λ(g 1 ) + λ(g 2 ) provided G 1 G 2 =. (35) Indeed, the two properties easily imply that the function µ defined on the class of all subsets of by (33) is a metric outer measure. 42

Let {G n } n=1 be a sequence of open sets and L < λ( n=1 G n). Then there is a function f Γ( n=1 G n) such that I(f) > L. Since supp f is compact, f Γ( N n=1 G n) for some finite N. According to the lemma f = f 1 +... + f N, f i Γ(G i ), so and (34) easily follows. L < I(f) = N I(f n ) n=1 λ(g n ) If f Γ(G 1 ), g Γ(G 2 ), G 1 G 2 =, then f + g Γ(G 1 G 2 ). Indeed, the condition G 1 G 2 = implies that f + g 1. Hence n=1 I(f) + I(g) = I(f + g) λ(g 1 G 2 ) and (35) follows upon taking the supremum over f and g. We proved that µ is a measure in B(). The measure µ is finite on compact sets. Indeed, if K G G and G is compact, then there is a function h C c () such that h = 1 on G and therefore Clearly and µ(g) = µ() = µ(k) sup I(f) I(h) <. f Γ(G) sup I(f) f Γ(G) inf µ(g) G open for every open G for every B(). Let g C c (), g 0. Since I g (f) = I(fg) is a positive functional it follows from what we have already proved that µ g (G) = sup I(fg) f Γ(G) for every open G and µ g () = inf G open µ g(g) is a measure in B() that is finite on compact sets. Lemma 55 I(g) = µ g (). for every B() Proof. For f Γ(), fg g and hence I(fg) I(g). On the other hand if f Γ(), f = 1 on supp g, then fg = g and hence I(fg) = I(g). Therefore µ g () = sup I(fg) = I(g). f Γ() 43

Lemma 56 (inf g)µ() µ g () (sup g)µ() for every B(). Proof. If supp f G, then it follows from (32) that (inf G Taking the supremum over f Γ(G) yields g)i(f) I(fg) (sup g)i(f). G (inf g)µ(g) µ g(g) (sup g)µ(g) G and then the lemma follows after taking the infimum over all open sets G such that G. This is because continuity of g implies G inf (inf g) = inf g and inf G open G (sup G open G g) = sup g. Lemma 57 µ g () = g dµ for all B(). Proof. The measure µ is finite on compact sets and hence every Borel set is a union of sets of finite measure (Corollary 9). Therefore it suffices to prove the equality for sets of finite measure µ() <. Given ε > 0 let s = n k=1 a kχ k be a simple function such that s g s + ε (Theorem 34). Here the sets k are pairwise disjoint and 10 n k=1 k =. and Then s dµ = (s + ε) dµ = We proved that (g ε) dµ n a k µ( k ) k=1 n k=1 n µ g ( k ) = µ g () k=1 n (a k + ε)µ( k ) k=1 n µ g ( k ) = µ g (). k=1 s dµ µ g () ( inf k g)µ( k ) n k=1 ( sup k (s + ε) dµ g)µ( k ) (g + ε) dµ. 10 i.e. we include the set where a k = 0. For example if s = χ, then we write s = 1 χ + 0 χ \. 44

Since ε > 0 was arbitrary and µ() <, the lemma easily follows. Comparing Lemma 55 and Lemma 57 we obtain that I(g) = g dµ which proves (31) for g 0. The general case follows from the fact that every function in C c () is a difference of two nonnegative functions. We are left with the proof of uniqueness of the measure. Suppose that I(f) = f dµ = f dν for all f C c (), where µ and ν are measures on that are finite on compact sets. According to Corollary 9 and Corollary 8 it suffices to prove that the measures µ and ν agree on the class of open sets. Let G be an open set and K G a compact set. Let f Γ(G) be such that f = 1 on K. Then µ(g) f dµ = I(f) = f dν ν(k). G Taking supremum over compact sets K G and applying Corollary 9 we obtain that µ(g) ν(g). Similarly we prove the opposite inequality. The proof of the theorem is complete. G 3 Integration on product spaces This should be a word for word copy of Chapter 8 from Rudin s book. There is no need to rewrite the whole chapter, so I will not include anything here. 4 Vitali covering lemma, doubling and Hausdorff measures 4.1 Doubling measures If σ > 0 and B is a ball in a metric space, then by σb we will denote a ball concentric with B and with the radius σ times that of B. Definition. Let be a metric space and µ a measure on B(). We say that µ is a doubling measure if there is a constant C d > 0 such that for every ball B, 0 < µ(b) < and µ(2b) C d µ(b). The Lebesgue measure is an example of a doubling measure with C d = 2 n. It turns out that general doubling measures inherit many properties of the Lebesgue measure. 45

Definition. We say that a metric space is a doubling-metric space if there is a constant M > 0 such that every ball B can be covered by no more than M balls of half the radius. xamples. IR n is metric-doubling. If is a metric-doubling space, then every subset of is metric-doubling. In particular every subset of IR n is metric-doubling. If is an infinite discrete space, then is not metric-doubling. Proposition 58 If a metric space is complete and metric-doubling, then it is locally compact. Proof. It suffices to prove that any closed ball is compact. It easily follows from the metric-doubling condition that each closed ball is totally bounded. Since it is complete (as a closed subset of a complete metric space) it is compact. One can also prove this theorem more directly by mimicking the proof that the interval [0, 1] is compact. Proposition 59 If µ is a doubling measure on a metric space, then is metricdoubling. Proof. We will prove that satisfies the metric-doubling condition with M = C 4 d. Let {x i } i I B(x, r) be a maximal r/2-separated set, i.e. i,j I d(x i, x j ) r 2 y B(x,r) i I d(x i, y) < r 2. From the second condition we have B(x, r) i I B(x i, r/2) and it suffices to show that the number of elements in I is bounded by Cd 4, i.e. #I C4 d. The balls B(x i, r/4) are pairwise disjoint and B(x i, r/4) B(x, 2r) B(x i, 4r). We have µ(b(x, 2r)) i I µ(b(x i, r/4)) C 4 d µ(b(x i, 4r)) C 4 µ(b(x, 2r))(#I) i I d and hence #I C 4 d. Theorem 60 (Volberg-Konyagin) If is a complete metric-doubling metric space, then there is doubling measure on. This theorem is very difficult and we will not prove it. Note that according to the previous proposition metric-doubling condition is necessary for the existence of a doubling measure, so Volberg-Konyagin s theorem is very sharp. The uclidean space is metric doubling and hence any subset of the uclidean space is metric-doubling. Therefore we have 46

Corollary 61 If IR n is closed, then there is a doubling measure on. Proposition 62 Let µ be a doubling measure on a metric space. Then there is a constant C > 0 such that ( ) s µ(b(x, r)) r µ(b(x 0, r 0 )) C whenever x B(x 0, r 0 ), r r 0 and s = log C d / log 2. r 0 Remark. For the Lebesgue measure in IR n, C d = 2 n, log C d / log 2 = n and hence L n (B(x, r))/l(b(0, 1)) Cr n which is the sharp lower bound estimate for the Lebesgue measure of the ball. That shows that the exponent s in the above lemma cannot be any smaller. Proof. Let x B(x 0, r 0 ), r r 0. Let k be the largest integer such that 2 k 1 r 2r 0. Hence 2 k r > 2r 0 and thus B(x 0, r 0 ) B(x, 2 k r). We have µ(b(x 0, r 0 )) µ(b(x, 2 k r)) C k d µ(b(x, r)), Since 2 k 1 r 2r 0, r/r 0 4 1 2 k and hence ( ) s ( r r = r 0 r 0 ) log Cd / log 2 µ(b(x, r)) µ(b(x 0, r 0 ) C k d. (36) (4 1 ) log C d/ log 2 (2 k ) log C d/ log 2 = AC k }{{} A d. This estimate together with (36) yields the claim. 4.2 Covering lemmas Theorem 63 (5r-covering lemma) Let B be a family of balls in a metric space such that sup{diam B : B B} <. Then there is a subfamily of pairwise disjoint balls B B such that B 5B. B B B B If the metric space is separable, then the family B is countable and we can arrange it as a sequence B = {B i }, so B 5B i. B B 47

Remark. Here B can be either a family of open balls or closed balls. In both cases proof is the same. Proof. Let sup{diam B : diameter of the balls B B} = R <. Divide the family B according to the F j = {B B : R 2 j < diam B R 2 j 1 }. Clearly B = j=1 F j. Define B 1 F 1 to be the maximal family of pairwise disjoint balls. Suppose the families B 1,..., B j 1 are already defined. Then we define B j to be the maximal family of pairwise disjoint balls in F j {B : B B = for all B j 1 B i }. Next we define B = j=1 B j. Observe that every ball B F j intersects with a ball in j B j. Suppose that B B 1, B 1 j B i. Then diam B and hence B 5B 1. The proof is complete. R 2 = 2 R j 1 2 2 diam B j 1 Remark. We have actually proved more: every ball B B is contained in 5B 1 for some ball B 1 B such that B 1 B. This fact will be used in the proof of the next result. Definition. We say that a family F of closed balls covers a set A in the Vitali sense if for every a A there is a ball B F of arbitrarily small radius such that a B. In other words a A inf{diam B : a B F} = 0. Theorem 64 (Vitali covering theorem) Let µ be a doubling measure on and A a measurable set. Let F be a family of closed balls that covers A in the Vitali sense. Then there is a subfamily G F of pairwise disjoint balls such that µ ( A \ B G B ) = 0. (37) Proof. Let F F be a subfamily of all balls with diameters less than 1. Observe that the family F also covers A in the Vitali sense. According to the 5r-covering lemma there is a countable subfamily G F of pairwise disjoint balls such that A B 5B. B F B G 48

We will prove that the family G satisfies (37). To this end it suffices to prove that for any ball Q of radius bigger than 1 µ ( Q (A \ B G B) ) = 0. Observe that Q ( A \ B G B ) = Q ( A \ B G B 10Q Indeed, if B is not contained in 10Q, then 11 B (Q A) =. The family of balls B such that B G and B 10Q is countable and hence we can name the balls by B 1, B 2, B 3,... We have µ(5b i ) C µ(b i ) Cµ(10Q) <. The first inequality follows from the doubling condition and the second inequality from the fact that the balls B i are pairwise disjoint and contained in 10Q. The above inequality implies that ( µ i=n and it remains to prove that Q ( A \ 5B i ) B G B 10Q µ(5b i ) 0 i=n ) B Q ( A \ B ). as N N ) B i 5B i. (38) The first inclusion is obvious. To prove the second one let a Q (A \ N B i). The set N B i is closed and a N B i, so there is a ball B a of sufficiently small radius r < 1 such that a B a F and B a i=n N B i =. (39) On the other hand it follows from the remark stated after the proof of the 5r-covering lemma that there is B G such that B a B, B a 5B. Since diam B < 1 and a Q, it follows that B 10Q and hence B = B j for some j. Now (39) implies that j > N and hence a 5B j 5B i which proves (38). i=n 11 Because diam B < 1 and the radius of Q is bigger than 1. 49

Corollary 65 For an open set U IR n and every ε > 0 there is a sequence of pairwise disjoint closed balls B 1, B 2, B 3,... contained in U and of radii less than ε such that ( ) L n U \ B i = 0. xercise. Show a direct proof of the corollary without using the Vitali covering lemma. 4.3 Doubling measures and the Hausdorff measure We will snow now how to use the covering lemmas to prove important results about Hausdorff measures. Theorem 66 For any Borel set A IR n and any 0 < δ L n (A) = H n δ (A) = H n (A). Proof. Step 1. If L n (Z) = 0, then Hδ n (Z) = 0. Indeed, for every ε > 0, Z can be covered by cubes Q i, each of diameter less than δ such that i Q i < ε. Since 12 (diam Q i ) n = C(n) Q i we have that and hence Hδ n (Z) = 0. Step 2. H n δ (A) L n(a). Hδ n (Z) ω n (diam Q 2 n i ) n = ω n 2 C(n) n i i Q i < ε ω n 2 n C(n) Let A Q i, where the cubes Q i have pairwise disjoint interiors and Q i L n (A) + ε. Fix i. It follows from Corollary 65 that there are pairwise disjoint balls B j Q i, j = 1, 2, 3,... such that L n (Q i \ B j ) = 0. j=1 12 By C(n) we denote a constant which depends on n only. 50

Now applying Step 1 w have Hδ n (Q i ) = Hδ n ( B i ) Therefore = Hδ n (B i ) ω n ri n L n (B i ) = L n ( B i ) = L n (Q i ). Hδ n (A) Hδ n ( Q i ) Hδ n (Q i ) Since ε > 0 was arbitrary, the claim follows. Step 3. L n (A) H n δ (A). We will need the following fact. L n (Q i ) L n (A) + ε. Lemma 67 (Isodiametric inequality) For any Lebesgue measurable set A IR n L n (A) ω n ( diam A 2 We will prove this lemma in Section 6. Observe that if A is a ball, then we have the equality. Therefore the lemma says that among all sets of fixed diameter, ball has the largest volume. Now we can complete the proof of Step 3. Let A C i, diam C i < δ. Then ( ) n diam Ci L n (A) L n (C i ) ω n = ω n (diam C 2 2 n i ) n. ) n Taking the infimum over all such coverings we obtain L n (A) H n δ (A). Step 4. General case. We proved in steps 2 and 3 that L n (A) = Hδ n (A) for all δ. Hence L n (A) = lim Hδ n (A) = H n (A). δ 0 The proof is complete. Now we will compare doubling measures with Hausdorff measures in metric spaces. Recall that in a given metric space there is 0 s such that H t () = for t < s and H t () = 0 for t > s. This unique number s is called Hausdorff dimension of and is denoted by s = dim H. 51

Theorem 68 Let µ be a doubling measure on a metric space, µ(2b) C d µ(b) for every ball B. Let s = log C d / log 2. Then dim H s. Proof. It suffices to prove that H s (B) < for every ball B. Indeed, then H t (B) = 0 for t > s and hence H t () = 0 for t > s. This, however, implies that dim H s. Let B = B(x 0, R). According to Proposition 62 there is a constant C > 0 such that for every x B and r < R, µ(b(x, r)) Cr s. Fix 0 < δ < R. Let {x i } B be a maximal δ/2-separated set, i.e. d(x i, x j ) δ/2 for i j and for every x B there is i I such that d(x, x i ) < δ/2. Therefore the balls B(x i, δ/4) are disjoint, contained in B(x, 2R) and B i I B(x i, r/2). We estimate the number of elements in I. µ(b(x, 2R)) i I µ(b(x i, δ/4)) i I C(δ/4) s = (#I)C(δ/4) s. Hence #I µ(b(x, 2R))C 1 4 s δ s = C δ s. }{{} does not depend on δ Now B i I B(x i, δ/2) is a covering of the ball by sets of diameters less than or equal to δ, and hence Hδ(B) s ω s (diam B(x 2 s i, δ/2)) s ω s 2 s δs (#I) ω s 2 s δs C δ s = ω s 2 s C i I Hence H s (B) = lim δ 0 H s δ(b) ω s 2 s C < Definition. We say that a Borel measure µ on a metric space is s-regular, if there is a constant C 1 such that for all x and all 0 < r < diam. C 1 r s µ(b(x, r)) Cr s Clearly, every s-regular measure is doubling. Theorem 69 Let µ be an s-regular measure on. Then there are constants C 1, C 2 > 0 such that C 1 µ() H s () C 2 µ() for every Borel set. In particular H s is s-regular. 52

Proof. First we will prove that there is a constant C 1 such that C 1 µ() H s () for every Borel set. Obviously we can assume that H s () <. Since every bounded set is contained in a ball of radius diam A, then µ(a) C(diam A) s by the s-regularity of the measure. For every ε > 0 there is a covering i such that ω s 2 s (diam i ) s H s () + ε. We have ( ) µ() µ i = C 2s ω s ω s 2 s µ( i ) C (diam i ) s (diam i ) s C 2s ω s (H s () + ε). Since ε > 0 was arbitrary we conclude that C 1 µ() H s (), where C 1 = C 1 ω s /2 s. The proof of the opposite inequality is more difficult. Lemma 70 If µ is an s-regular measure on a metric space, then there is a constant M > 0 such that for every ball B of radius R and any r < R, the ball B can be covered by no more than M(R/r) s balls of radius r. Proof. Let {x i } i I B be a maximal r-separated set, i.e. d(x i, x j ) r for i j and for every x B there is i I such that x B(x i, r). Hence B i I B(x i, r) and B(x i, r/2) B(x j, r/2) = for i j. Since B(x i, r/2) 2B and µ(b(x i, r/2)) C 1 (r/2) s we have C(2R) s µ(2b) (#I)C 1 (r/2) s. Hence #I C 2 4 s (R/r) s and hence the lemma follows with M = C 2 4 s. The lemma implies that Hδ s(b) C µ(b) where B is a ball of radius R. Indeed, we can cover B by M(R/r) s balls of radius r and we can take r so small that 2r < δ. Hence Hδ(B) s ω ( ) s s R 2 M (2r) s = ω s s MR s C µ(b). r Since the constant C is independent of δ we conclude that H s (B) C µ(b). We will now prove a stronger inequality which says that for an arbitrary open set U H s (U) C µ(u). This will imply that H s () C µ() for any Borel set. Indeed, both measures µ and H s have the property that is a union of a countably family of open sets of finite 53

measure. Therefore both measures have the property that the measure of any Borel set equals infimum of measures of open sets that contain given Borel set (Theorem 7). It follows from the proof of Lemma 70 that is separable, so according to the 5r-covering lemma we can find disjoint balls B i U such that U 5B i. Hence H s (U) H s (5B i ) C µ(5b i ) C µ(b i ) C µ(u). The proof is complete. 5 Differentiation 5.1 Lebesgue differentiation theorem Definition. We say that f L 1 loc (µ), if for every x there is r > 0 such that f dµ <. B(x,r) by The integral average of f over a measurable set of finite measure will be denote f = µ() 1 f dµ = f dµ. Theorem 71 (Lebesgue differentiation theorem) If µ is a doubling measure on a metric space and f L 1 loc (µ), then f dµ = f(x) for µ-a.e. x. (40) lim r 0 B(x,r) Proof. We can assume that f 0. Since is separable, it is a countable union of balls such that f is integrable on each such ball. Therefore, it suffices to prove that if f dµ <, then (40) is satisfied for µ-a.e. x B. B then If for a measurable set A B we have lim inf f dµ t for all x A, r 0 B(x,r) A f dµ tµ(a). Indeed, given ε > 0, let U be an open set such that A U, µ(u) µ(a) + ε. For every x A there is arbitrarily small r > 0 such that B(x, r) U and f dµ t + ε. B(x,r) 54

Such closed balls form a covering of U in the Vitali sense and hence we can select pairwise disjoint balls B 1, B 2, B 3,... from the covering such that µ(u \ B i ) = 0, B i f dµ t + ε. We have A f dµ f dµ = U f dµ (t + ε) µ(b i ) B i = (t + ε)µ(u) (t + ε)(µ(a) + ε). Since ε > 0 was arbitrary, we conclude f dµ tµ(a) Similarly, if A B is a measurable set such that lim sup r 0 f dµ t for all x A, then Let Then B(x,r) A s,t = {x B : lim inf r 0 B(x,r) A A f dµ tµ(a). (41) f dµ s < t lim sup f dµ}. r 0 B(x,r) tµ(a s,t ) f dµ sµ(a s,t ) A s,t and hence µ(a s,t ) = 0. This easily implies that lim inf f dµ = lim sup f dµ for µ-a.e. x B, r 0 r 0 B(x,r) B(x,r) i.e. the limit lim r 0 f dµ exists for µ-a.e. x B. Denote this limit by B(x,r) g(x) = lim f dµ r 0 B(x,r) whenever the limit exists. We have to prove that f(x) = g(x) for µ-a.e. x B. Fix a measurable set F B and ε > 0. Let A n = {x F : (1 + ε) n g(x) < (1 + ε) n+1 }, n Z. 55

Observe that lim sup f dµ = g(x) (1 + ε) n for all x A n, r 0 B(x,r) and hence A n f dµ (1 + ε) n µ(a n ) by (41). We have Similarly we prove that F g dµ = g dµ (1 + ε) n+1 µ(a n ) n= A n n= (1 + ε) f dµ = (1 + ε) f dµ. A n F F n= g dµ (1 + ε) 1 F f dµ. Since ε > 0 was arbitrary we conclude that for every measurable set F B f dµ = g dµ and hence f = g µ-a.e. F Among all representatives of f in the class of functions that are equal f a.e. we select the following one f(x) := lim sup f dµ. (42) r 0 B(x,r) It follows from the Lebesgue differentiation theorem that the right hand side of (42) which is defined everywhere equals f a.e. Definition. Let f L 1 loc (µ). We say that x is a Lebesgue point of f if f(y) f(x) dµ(y) = 0, lim r 0 where f(x) is defined by (42). B(x,r) F Theorem 72 Let µ be a doubling measure and f L 1 loc (µ). Then µ-a.e. point x is a Lebesgue point of f. Proof. For every c IR lim r 0 B(x,r) f(y) c dµ(y) = f(x) c µ-a.e. (43) 56

Let = {x : (43) holds for all c Q}. Since Q is countable, µ( \ ) = 0. Let x, be such that f(x) <, and let Q c n f(x). Then it follows from (43) that f(y) f(x) dµ = f(x) f(x) = 0. lim r 0 B(x,r) Definition. We say that a family F of measurable sets in is regular at x if there is a constant C > 0 such that for every S F there is a ball B(x, r S ) such that S B(x, r S ), µ(b(x, r S )) Cµ(S), and for every ε > 0 there is a set S F with µ(s) < ε. Theorem 73 Let µ be a doubling measure and f L 1 loc (µ). If x is a Lebesgue point of f and F is a regular family at x, then lim S F µ(s) 0 S f dµ = f(x). Proof. For S F let r s be defined as above. Observe that if µ(s) 0, then r s 0 by Proposition 62. We have f dµ f(x) f(y) f(x) dµ(y) µ(s) 1 f(y) f(x) dµ(y) S S B(x,r S ) = µ(b(x, r S)) f(y) f(x) dµ(y) C f(y) f(x) dµ(y) 0 µ(s) B(x,r S ) B(x,r S ) as µ(s) 0. Corollary 74 If f L 1 loc (IRn ) and x IR n is a Lebesgue point of f, then for any sequence of cubes Q i such that x Q i, diam Q i 0 we have lim f dµ = f(x). i Q i Corollary 75 Let F (x) = x a f(t) dt, where f L1 (a, b). Then F (x) = f(x) for a.e. x (a, b). Proof. We have F (x + h) F (x) h whenever x is a Lebesgue point of f. = x+h x f(t) dt f(x) as h 0, 57

Theorem 76 If A IR n is a measurable set, then for almost all x A and for almost all x IR n \ A B(x, r) A lim r 0 B(x, r) B(x, r) A lim r 0 B(x, r) = 1 = 0. Proof. Applying the Lebesgue differentiation theorem to f = χ A L 1 loc (IRn ) we have that for a.e. x A B(x, r) A = χ A dl n χ A (x) = 1 B(x, r) B(x,r) and for a.e. x IR n \ A B(x, r) A B(x, r) χ A (x) = 0. Definition. Let A IR n be a measurable set. We say that x IR n is a density point of A if A B(x, r) lim = 1. r 0 B(x, r) Thus the above theorem says that almost every point of a measurable set A IR n is a density point. 6 Brunn-Minkowski inequality In this section we will prove the Brunn-Minkowski inequality, isodiametric inequality and the isoperimetric inequality. Theorem 77 (Brunn-Minkowski inequality) If A, B IR n are compact sets, then where A + B = {a + b : a A, b B}. L n (A + B) 1/n L n (A) 1/n + L n (B) 1/n. Proof. Observe that if we translate the sets, then the set A + B will also translate, so the inequality will remain the same. In particular, we can locate the origin of the coordinate system at any point. We divide the proof into several steps. 58

Step 1. A and B are rectangular boxes with sides parallel to the coordinate axes. Denote the edges of A and B by a 1,...,a n and b 1,...,b n respectively. Then A + B is also a rectangular box with edges a 1 + b 1,...,a n + b n. Hence the Brunn-Minkowski inequality takes the form which is equivalent to n (a i + b i ) 1/n n a 1/n i + n b 1/n i 7 L p spaces Definition. Let (, µ) be a measure space. For 0 < p <, L p (µ) denotes the class of all complex valued measurable functions such that ( 1/p f p = f dµ) p < and we define L p (µ) = L p (µ)/, where f g if f = g a.e. For p = we define L to be the class of essentially bounded measurable functions, i.e. such that there is M > 0 with f(x) M µ-a.e. (44) It is easy to see that there is a smallest number M satisfying (44). We denote this number by f. Therefore f(x) f µ-a.e. and for any ε > 0, µ({x : f(x) > f ε}) > 0. Finally we set L (µ) = L(µ)/. xample. Let µ be a counting measure on IN = {1, 2, 3,...}, i.e. for A IN, µ(a) = #A, the number of elements in the set A if A is finite and µ(a) = is A is infinite. Clearly if f : IN C, then f dµ = f(n). IN The functions f : IN C can be identified with complex sequences and it is easy to see that L p (µ) = {x = (x n ) n=1 : x p = ( x n p) 1/p < }, and L (µ) = {x = (x n ) n=1 : x = sup x n < }. n IN In this particular situation we denote L p (µ) by l p for all 0 < p. 59 n=1 n=1

Theorem 78 (Jensen s inequality) Let (, µ) be a measure space such that µ() = 1. If f L 1 (µ) satisfies a < f(x) < b for a.e. x and ϕ is a convex function on (a, b), then ( ) ϕ f dµ (ϕ f) dµ. Remark. We do not exclude the cases a = and b =. Recall that ϕ is convex on (a, b) if ϕ((1 λ)x + λy) (1 λ)ϕ(x) + λϕ(y) for all x, y (a, b) and 0 λ 1. It is easy to see that this condition is equivalent to ϕ(t) ϕ(s) t s ϕ(u) ϕ(t) u t for all s, t, u such that a < s < t < u < b (45) Lemma 79 very convex function on (a, b) is continuous. 60

Proof. Fix a < s < t < b and take x, y such that a < s < x < y < t < b. Denote by S,, Y, T the corresponding points on the graph of ϕ. The point Y belongs to the angle with the vertex formed by the half-lines S and T. This easily implies that if y is decreasing to x, then Y, which proves the right hand side continuity of ϕ. The proof of the left hand side continuity is similar. Proof of Jensen s inequality. For t = f dµ we have a < t < b. Fix this value of t and define ϕ(t) ϕ(s) β = sup. a<s<t t s It follow from (45) that and hence On the other hand and hence β ϕ(u) ϕ(t) u t for all t < u < b ϕ(u) ϕ(t) + β(u t) for all t < u < b. (46) ϕ(t) ϕ(s) t s β The inequalities (46) and (47) imply that for all a < s < t ϕ(s) ϕ(t) + β(s t) for all a < s < t (47) ϕ(s) ϕ(t) + β(s t) for all a < s < b. In particular ϕ(f(x)) ϕ(t) β(f(x) t) 0 a.e. 61

Integrating with respect to µ yields ( ) ( ϕ(f(x)) ϕ f dµ β f(x) }{{} constant f dµ }{{} constant ) dµ 0. Since µ() = 1 we conclude ( ) ( ) (ϕ f) dµ ϕ f dµ β f dµ f dµ 0 } {{ } and hence ( ) (ϕ f) dµ ϕ f dµ. The numbers 1 < p, q < such that 1/p + 1/q = 1 or p = 1 and q = are called Hölder conjugate or just conjugate exponents. Observe that if 1 < p <, then q = p/(p 1). 0 Theorem 80 (Hölder s inequality) Let (, µ) be a measure space and 1 < p, q < be conjugate exponents. Then for any two complex valued measurable functions f and g we have fg dµ f p g q. Proof. We will need the following lemma. Lemma 81 (Young s inequality) If a, b > 0, then ab ap p + bq q. Proof. We can assume that a, b > 0, otherwise the inequality is obvious. Then a = e s/p, b = e t/q for some s, t IR and the inequality is equivalent to e s/p+t/q e s /p + e t /q. Now the lemma follows from the convexity of e x. We can assume that 0 < f p < and 0 < g q <, otherwise the Hölder inequality is obvious. The lemma yields f(x)g(x) f p g q 1 p ( f(x) f p ) p + 1 q ( ) q g(x). g q 62

Integrating this inequality yields ( 1 fg dµ f p g q p f p f p dµ + 1 ) p q g q g q dµ q }{{} =1/p+1/q=1 and the theorem follows. Theorem 82 (Minkowski s inequality) Let (, µ) be a measure space and 1 p. Then for any two complex valued measurable functions f and g we have f + g p f p + g p. Proof. If p = 1, the inequality is obvious. If p =, then f(x) + g(x) f(x) + g(x) f + g a.e. Hence f + g f + g. If 1 < p <, then we need to use the Hölder inequality. f + g p dµ f f + g p 1 dµ + g f + g p 1 dµ ( ) 1/p ( f p dµ ( ( f p + g p ) and hence ( 1/q ( f + g dµ) (p 1)q + f + g p dµ ) 1/q Since 1 1/q = 1/p the theorem follows. f + g p dµ) 1 1/q f p + g p. ) 1/p ( ) 1/q g p dµ f + g (p 1)q dµ xample. In the case of space l p, the Hölder and the Minkowski inequality read as follows ( ) 1/p ( ) 1/q x n y n x n p y n q, n=1 n=1 n=1 ( ) 1/p ( ) 1/p ( ) 1/p x n + y n p x n p + y n p. n=1 n=1 Definition. Let K denote IR or C. The normed space is a pair (, ), where is a linear space over K and : [0, ) is a function such that: 63 n=1

(a) x + y x + y for all x, y ; (b) αx + α x for all x and α K; (c) x = 0 iff x = 0. The function is called a norm. Since x y = (x z) + (z y) x z + z y we see that d(x, y) = x y defines a metric in. Normed spaces are always regarded as metric spaces with respect to this metric. Definition. We say that a normed space (, ) is a Banach space if it is complete with respect to the metric d(x, y) = x y. It follows from the Minkowski inequality that (L p (µ), p ) is a normed space for all 1 p. Theorem 83 L p (µ) is a Banach space for all 1 p. Proof. We will prove this theorem for 1 p < and leave the case p = as an exercise. Let {f n } be a Cauchy sequence in L p (µ). Let {f ni } be a subsequence such that f ni+1 f ni p < 2 i, i = 1, 2, 3,... Let g k = k f ni+1 f ni, g = f ni+1 f ni. It follows from the Minkowski inequality that g k p < 1, and then Fatou s lemma applied to the sequence g p k yields g p dµ = (lim inf k gp k ) dµ lim inf g p k dµ 1, k so g p 1. In particular g(x) < a.e. Thus the series f n1 (x) + converges absolutely for a.e. x. Since (f ni+1 (x) f ni (x)) f n1 (x) + k (f ni+1 (x) f ni (x)) = f nk+1 (x) 64

we conclude that the sequence f nk (x) converges for a.e. x. Denote its limit by f(x), i.e. lim f n k (x) = f(x) a.e. k It suffices to prove that f L p and f f n p 0 as n. For every ε > 0 there is N such that f n f m p < ε for all n, m N. Applying Fatou s lemma to f ni f m p with m N fixed and i = 1, 2, 3,... we have f f m p dµ lim inf f ni f m p dµ ε p. i Hence f f m L p and thus f L p. Moreover f f m p 0 as m. xercise. Prove the above theorem for p =. The following fact follows from the proof of the above theorem. Corollary 84 If f n f in L p (µ), then there is a subsequence f ni f a.e. which converges to Lemma 85 Let S be the class of complex, measurable, simple functions s on such that µ({x : s(x) 0}) <. If 1 p <, then S is dense in L p (µ). Proof. Clearly S L p (µ). If f L p (µ) and f 0, let s n be a sequence of simple functions such that 0 s n f, s n f pointwise. Since s n L p it easily follows that s n S. Now inequality 0 f s n p f p and the dominated convergence theorem implies that s n f in L p. In the general case we write f = (u + u ) + i(v + v ) and apply the above argument to each of the functions u +, u, v +, v separately. Theorem 86 Let be a locally compact metric space and µ a Radon measure on. Then the class of compactly supported continuous functions C c () is dense in L p (µ) for all 1 p <. Proof. According to the lemma it suffices to prove that the characteristic function of a set of finite measure can be approximated in L p by compactly supported continuous functions. Let be a measurable set of finite measure. Given ε > 0 let K be a compact set such that µ( \ K) < (ε/2) p. Let U be an open set such that K U, U is compact and µ(u \ K) < (ε/2) p. Finally let ϕ C c () be such that supp ϕ U, 0 ϕ 1, ϕ(x) = 1 for x K. We have ( 1/p ( 1/p ( ) 1/p χ ϕ p = χ ϕ dµ) p χ dµ) p + ϕ p dµ \K µ( \ K) 1/p + µ(u \ K) 1/p < ε. 65 \K \K

7.1 Convolution Let us start with an observation that since the Lebesgue measure in invariant under translations and under the mapping x x, for any f L 1 (IR n ) and any y IR n we have u(x) dx = IR n u(x + y) dx = IR n u( x) dx. IR n Also since for every measurable set and any t > 0 the set t = {tx : x } has measure L n (t) = t n L n () we easily conclude that the Lebesgue integral has the following scaling property u(x/t) dx = t n IR n u(x) dx. IR n Observe that in the case of the Riemann integral the above equalities are direct consequences of the change of variables formula. We will prove a corresponding change of variables formula for the Lebesgue integral later, but as for the proof of the above equalities we do not have to refer to the general change of variables formula as they follow directly from the properties of the Lebesgue measure mentioned above. Definition. For measurable functions f and g on IR n we define the convolution by (f g)(x) = f(x y)g(y) dy IR n and it is a question under what conditions the convolution is well defined. If f L 1 loc (IRn ) and g is bounded, measurable and vanishes outside a bounded set, then the function y f(x y)g(y) is integrable, so (f g)(x) is well defined and finite for every x IR n. If f, g L 1 (IR n ), then it can happen that for a given x the function y f(x y)g(y) is not integrable and hence (f g)(x) is not defined. However as a powerful application of the Fubini theorem we can prove the following surprising result. Theorem 87 If f, g L 1 (IR n ), then for a.e. x IR n the function y f(x y)g(y) is integrable and hence (f g)(x) exists. Moreover f g L 1 (IR n ) and f g 1 f 1 g 1. Proof. The function f(x y)g(y) as a function of a variable (x, y) IR 2n is measur- 66

able. 13 Hence Fubini s theorem yields ( ) f(x y)g(y) dx dy = f(x y) g(y) dx dy IR n IR n IR n IR ( ) n = f(x y) dx g(y) dy = f 1 g 1. IR n IR n Therefore f 1 g 1 = IR n ( ) f(x y)g(y) dy dx IR n IR n f(x y)g(y) dy dx = f g 1. IR n Theorem 88 If f, g, h L 1 (IR n ) and α, β C, then 1. f g = g f; 2. f (g h) = (f g) h; 3. f (αg + βh) = αf g + βf h. Remark. This theorem says that (L 1, ) is a commutative algebra. Proof. (1) We have (f g)(x) = f(x y)g(y) dy = IR n f( y)g(y+x) dy = IR n f(y)g(x y) dy = (g f)(x). IR n (2) We have ( ) f (g h)(x) = f(x y)(g h)(y) dy = f(x y) g(y z)h(z) dz dy IR n IR n IR ( ) ( n ) = h(z) f(x y)g(y z) dy dz = h(z) f( y)g(y z + x) dy dz IR n IR n IR n IR ( ) n = h(z) IR n f(y)g((x z) y) dy IR n dz = h(z)(f g)(x z) dz = (f g) h(x). IR n (3) It is easy and left to the reader. Theorem 89 If f L 1 loc (IRn ) and g is continuous with compact support, then 1. f g is continuous on IR n. 13 Why? 67

2. If f C 1 (IR n ), then f g C 1 (IR n ) and (f g)(x) = ( f g ) (x). x i x i 3. If g C 1 (IR n ), then f g C 1 (IR n ) and x i (f g)(x) = ( f g x i ) (x). 4. If f C r (IR n ) or g C r (IR n ), then f g C r (IR n ). Proof. (1) Since g is bounded and has compact support, (f g)(x) is well defined and finite for all x IR n. Fix x IR n. The function y f(y)g(x y) vanishes outside a sufficiently large ball B (because g as compact support). Let x n x. Then there is a ball B (perhaps larger than the one above), so that all the functions vanish outside B. Hence y f(y)g(x n y) f(y)g(x n y) g f(y) χ B (y) L 1 (IR n ) and the dominated convergence theorem yields (f g)(x n ) = f(y)g(x n y) dy IR n f(y)g(x y) dy = (f g)(x) IR n which proves continuity of f g. (2) The function f/ x i is bounded on bounded subsets of IR n (because it is continuous). Fix x IR n. For any r > 0 there is a constant M > 0 such that f x i M for all z B(x, r + 1). Let t k 0, t k < 1. We have f(x + t k e i y) f(x y) t k g(y) f x i (x y)g(y). The functions on the left hand side are bounded by M g for all y with y < r (by the mean value theorem). Take r so large that supp g B(0, r). Then (f g)(x + t k e i ) (f g)(x) = B(0,r) t k f(x + t k e i y) f(x y) t k g(y) dy B(0,r) f (x y)g(y) dy = ( f g ) (x). x i x i 68

We could pass to the limit under the sign of the integral, because the functions were uniformly bounded. (3) The proof is similar to that in the case (2); details are left to the reader. (4) This result follows from (2) and (3) by induction. Let ϕ C0 (B(0, 1)) be such a function that ϕ 0 and ϕ(x) dx = 1. A function IR n with the given properties can be constructed explicitly. Indeed, { ( ) 1 exp if x < 1, ϕ(x) = x 2 1 0 if x 1. belongs to C 0 (IR n ) and we define For ε > 0 we set ϕ(x) = ϕ(x) IR n ϕ(y) dy. ϕ ε (x) = ε n ϕ(x/ε). Then supp ϕ ε B(0, ε), ϕ ε 0 and IR n ϕ ε (x) dx = 1. For f L 1 loc (IRn ) we put It follows from Theorem 89 that f ε C. f ε = f ϕ ε. Theorem 90 Let f be a function defined on IR n. 1. If f is continuous, then f ε f uniformly on compact sets as ε 0. 2. If f C r, then D α f ε D α f uniformly on compact sets for all α r. Proof. (1) Uniform continuity of f on compact sets implies that for any compact set K sup sup f(x) f(x y) 0 as ε 0. y <ε x K Since ϕ IR n ε (y) dy = 1, we have f(x)ϕ IR n ε (y) dy = f(x) and hence for any compact set K sup f(x) f ε (x) = sup (f(x) f(x y))ϕ ε (y) dy x K x K IR n f(x) f(x y) ϕ ε (y) dy 0 as ε 0. sup x K B(0,ε) (2) It follows from Theorem 89 and the induction argument that D α (f ϕ ε ) = (D α f) ϕ ε for all α r and hence the theorem follows from the part (1). 69

Lemma 91 If f L p, 1 p <, then f ε L p (IR n ) and f ε p f p. Proof. If p = 1, then f ε 1 = f ϕ ε 1 f 1 ϕ ε 1 = f 1 by Theorem 87. Let then 1 < p <. Hölder s inequality yields p ( f ε (x) p = f(x y)ϕ ε (y) dy f(x y) ϕ ε (y) 1/p ϕ ε (y) 1/q dy IR n IR ( ) ( n ) p/q f(x y) p ϕ ε (y) dy ϕ ε (y) dy = f p ϕ ε (x). IR n IR n Hence f ε p p f p ϕ ε (x) dx = f p ϕ ε 1 f p 1 ϕ ε 1 = f p p. IR n The following result proves density of the class of smooth functions in L p (IR n ). ) p Theorem 92 If f L p (IR n ) and 1 p <, then f ε L p (IR n ) and f f ε p 0 as ε 0. Proof. We will need the following result. Lemma 93 If f L p (IR n ), 1 p <, then lim y 0 IR n f(x + y) f(x) p dx = 0. Proof. For y IR n let τ y : L p (IR n ) L p (IR n ) be the translation operator defined by (τ y f)(x) = f(x + y) for f L p (IR n ). The lemma says that τ y is continuous, i.e. τ y f f p 0 as y 0. Given ε > 0 let g be a compactly supported continuous function such that f g p < ε/3, see Theorem 86. Then τ y f f p τ y f τ y g p + τ y g g p + f g p = 2 f g p + τ y g g p < 2ε/3+ τ y g g p. Since τ y g g converges uniformly, there is δ > 0 such that τ y g g p < ε/3 for y < δ and hence τ y f f p < ε for y < δ, 70

which proves the lemma. Now we can complete the proof of the theorem. Assume first that 1 < p <. Hölder s inequality and Fubini s theorem yield p f f ε p = (f(x) f(x y))ϕ ε (y) dy dx IR n IR ( n ) p f(x) f(x y) ϕ ε (y) 1/p ϕ ε (y) 1/q dy dx IR n IR ( n IRn f(x) f(x y) p ϕ ε (y) dy ( ) ) p/q ϕ ε (y) dy dx IR n IR } n {{} 1 = f(x) f(x y) p ϕ ε (y) dy dx IR n B(0,ε) ( ) = f(x) f(x y) p dx ϕ ε (y) dy 0 as ε 0. B(0,ε) IR n by Lemma 93. If p = 1, then the above argument simplifies, because we do not have to use Hölder s inequality. Corollary 94 C 0 is dense in L p (IR n ) for all 1 p <. Proof. Since continuous functions with compact support are dense in L p it suffices to prove that every compactly supported continuous function can be approximated in L p by C0. This, however, immediately follows from Theorem 92, because if f vanishes outside a bounded set, then f ε has compact support. 8 Functions of bounded variation Definition. We say that a function f : [a, b] IR has bounded variation if its variation defined by b n 1 f = sup f(x i+1 ) f(x i ) < a i=0 is finite, where the supremum is taken over all n and all partitions a = x 0 < x 1 <... < x n = b. Clearly monotonic functions have bounded variation b f = f(b) f(a). a 71

If a function f : [a, b] IR has bounded variation, then the function x x a f is nondecreasing on [a, b]. Also the function x x a f f(x) is nondecreasing, because f(y) f(x) y x f for y > x. Therefore every function of bounded variation is a difference of two nondecreasing functions. f(x) = x f ( x f f(x) ). (48) a a We proved Theorem 95 (Jordan) A function f : [a, b] IR has bounded variation if and only if it is a difference of two nondecreasing functions. xercise. Prove that if f : [a, b] IR is a continuous function of bounded variation, then the function x x a f is continuous. The exercise and the formula (48) implies Theorem 96 A continuous function has bounded variation if and only if it is a difference of two continuous nondecreasing functions. Geometrically bounded variation is related to the length of a curve. Recall that if ϕ = (ϕ 1,..., ϕ k ) : [a, b] IR k is a continuous curve, then its length is defined by 72

n 1 l(ϕ) = sup ϕ(x i+1 ) ϕ(x i ) i=0 n 1 = sup (ϕ1 (x i+1 ) ϕ 1 (x i )) 2 +... + (ϕ k (x i+1 ) ϕ k (x i )) 2 where the supremum is taken over all n and all partitions a = x 0 < x 1 <... < x n = b. Definition. We say that a curve ϕ : [a, b] IR k is rectifiable if it has finite length l(ϕ) <. The following result easily follows from the definition of length and the definition of variation. Theorem 97 A continuous curve ϕ : [a, b] IR k is rectifiable if and only if the coordinate functions ϕ i : [a, b] IR are of bounded variation for i = 1, 2,..., k. The following result is an easy exercise. Theorem 98 If functions f, g : [a, b] IR have bounded variation, then the functions f ± g and fg have bounded variation. If in addition g c > 0, then the function f/g has bounded variation. xercise. Prove the above theorem. Proposition 99 Let f be a monotonic function on (a, b). Then the set of points of (a, b) at which f is discontinuous is at most countable. Proof. Suppose f is nondecreasing. If f is discontinuous at x (a, b), then lim f(t) < lim f(t) t x t x + and hence there is a rational number r(x) such that lim f(t) < r(x) < lim f(t). t x x t + It is easy to see that if f is discontinuous at x 1 and at x 2, x 1 x 2, then r(x 1 ) r(x 2 ). Therefore there is a one-to-one correspondence between the set of points where f is discontinuous and the following subset of Q {r(x) : f is discontinuous at x} which clearly is at most countable. 73

Corollary 100 If f : [a, b] IR has bounded variation, then the set of points at which f is discontinuous is at most countable. We will prove now a deep and important result. Theorem 101 Functions of bounded variation are differentiable a.e. Proof. It suffices to prove that nondecreasing functions are differentiable a.e. Lemma 102 If f : [a, b] IR is nondecreasing, then lim sup h 0 + f(x + h) f(x) h < a.e. (49) Proof. Let be the set of points where the upper limit (49) equals. Fix an arbitrary K > 0. For every x there is an arbitrarily small interval [x, x + h] such that i.e. f(x + h) f(x) h > K, h < K 1 (f(x + h) f(x)). (50) Such intervals cover the set in the Vitali sense. Accordingly, we can select disjoint intervals I 1, I 2,... such that \ k=1 I k = 0, so k=1 I k and hence I k < K 1 (f(b) f(a)) k=1 by (50) and the fact that f is nondecreasing. Since the above inequality holds for any K > 0 it follows that = 0. Definition. The Dini derivatives are defined as follows D + f(x) = lim sup h 0 + D f(x) = lim sup h 0 f(x + h) f(x) h f(x + h) f(x) h, D + f(x) = lim inf h 0 +, D f(x) = lim inf h 0 f(x + h) f(x) h f(x + h) f(x) h,. The lemma says that D + f < a.e. Similarly we prove that all other Dini derivatives D + f, D f, D f are finite a.e. We need to prove that D + f = D + f = D f = D f a.e. (51) 74

To this end it suffices to prove that D f D + f a.e. and D + f D f a.e. because then (51) will follow from the inequality D + f D + f D f D f D + f a.e. We will prove only that D f D + f a.e. as the prof of the inequality D + f D f a.e. is very similar. We need to prove that the set {D f < D + f} has measure zero. We have {D f < D + f} = {D f < u < v < D + f}. }{{} (u,v) 0<u<v u,v Q This is a countable union of sets (u, v) and thus it suffices to prove that each set (u, v) has measure zero. Let U be an arbitrary open set such that (u, v) U. For every x (u, v) there is an arbitrarily small h > 0 such that [x h, x] U and (f(x h) f(x))/( h) < u, i.e. 14 x f < hu. (52) x h Such intervals cover the set (u, v) in the Vitali sense, so we can select pairwise disjoint intervals I 1, I 2,... such that We have k=1 (u, v) \ I k I k = 0. k=1 f u I k u U. k=1 The first inequality follows from (52) and the second inequality from the fact that the intervals I 1, I 2,... are disjoint and contained in U. Denote 15 G = (u, v) k=1 int I k. Clearly (u, v)\g = 0. For every x G there is an arbitrarily short interval [x, x + h] k int I k such that (f(x + h) f(x))/h > v and hence x+h x f > hv. 14 We take [x h, x] because D f < u is a condition for the limit as h 0. 15 int I k denotes the interior of I k. 75

These intervals cover the set G in the Vitali sense, so we can select pairwise disjoint intervals J 1, J 2,... k I k such that G \ k J k = 0 and f > v J k = v J k. We have k=1 J k k=1 v (u, v) = v G v J k < k=1 k=1 f k=1 J k k=1 I k f u U. Since the measure of U can be arbitrarily close to the measure of (u, v) we obtain v (u, v) v (u, v) and hence (u, v) = 0, because v > u. The proof is complete. Theorem 103 If a function f : [a, b] IR has bounded variation, then f L 1 (a, b). If in addition f is nondecreasing, then b a f (x) dx f(b) f(a). xample. The Cantor Staircase Function. This is a continuous nondecreasing function f : [0, 1] [0, 1] such that f(0) = 0, f(1) = 1 and f is constant in each of the intervals in the complement of the Cantor set. Since the Cantor set has measure zero we conclude that f (0) = 0 a.e. Therefore the function f has bounded variation (as a nondecreasing function) and 0 = 1 0 f (x) dx < f(1) f(0) = 1. 76

Actually Sierpiński provided a more surprising example. increasing function with the derivative equal to zero a.e. He constructed a strictly Proof of the theorem. We can assume that f is nondecreasing. Let a ε > a and b ε < b be points of differentiability of the function F (x) = x f(t) dt such that a a a ε < ε, b b ε < ε and F (a ε ) = f(a ε ), F (b ε ) = f(b ε ). Since Fatou s lemma yields bε f (x) dx lim inf a h 0+ ε f(x + h) f(x) lim h 0+ h bε f(x + h) f(x) a ε h = f(b ε ) f(a ε ) f(b) f(a). Passing to the limit with ε 0 yields the result. = f (x) a.e. dx = lim inf h 0 ( bε+h b ε f aε+h a ε ) f 8.1 Absolutely continuous functions Definition. We say that a function f : [a, b] IR is absolutely continuous if for every ε > 0 there is δ > 0 such that if (x 1, x 1 + h 1 ),..., (x k, x k + h k ) are pairwise disjoint intervals in [a, b] of total length less than δ, i.e. k h i < δ, then k f(x i + h i ) f(x i ) < ε. Theorem 104 Absolutely continuous functions have bounded variation. We leave the proof as an exercise. Theorem 105 If f, g : [a, b] IR are absolutely continuous, then also the functions f ± g and fg are absolutely continuous. If, in addition, g c > 0 on [a, b], then f/g is absolutely continuous. We leave the proof as an exercise. The following result provides a complete characterization of absolutely continuous functions as the functions for which the fundamental theorem of calculus holds. Theorem 106 If f L 1 ([a, b]), then the function F (x) = x f(t) dt is absolutely a continuous. On the other hand if F is absolutely continuous, then its derivative is integrable F L 1 ([a, b]) and F (x) = F (a) + x a F (t) dt 77 for all x [a, b].

Proof. Absolute continuity of the function F (x) = x f(t) dt follows from the absolute a continuity of the integral, Theorem 52. If F is absolutely continuous, then it has bounded variation and hence F L 1 ([a, b]) by Theorem 103. This implies that the function x x is absolutely continuous. Hence the function a F (t) dt ϕ(x) = F (x) F (a) x a F (t) dt is absolutely continuous, ϕ(a) = 0, and 16 ϕ (x) = 0 for a.e. x. It suffices to prove that ϕ(x) = 0 for all x. This will immediately follow from the next lemma. Lemma 107 If ϕ : [a, b] IR is absolutely continuous and ϕ (x) = 0 a.e., then ϕ is constant. Proof. Fix β (a, b) and ε > 0. For almost every x (a, β) there is an arbitrary small interval [x, x + h] (a, β) such that ϕ(x + h) ϕ(x) h < ε. Such intervals form a Vitali covering of the set of points in (a, β) where the derivative of ϕ equals 0 (almost all points in (a, β)). Hence we can select pairwise disjoint intervals from this covering [x 1, x 1 + h 1 ], [x 2, x 2 + h 2 ],... (a, β) that cover almost all points of (a, β). For a given ε > 0 we choose δ > 0 as in the condition for the absolute continuity of ϕ. For N sufficiently large, the set (a, β) \ N [x i, x i + h i ] has measure less than δ. This set is a union of N + 1 open intervals. Denote these intervals by (ξ j, ξ j), j = 1, 2,..., N + 1. We have ϕ(β) ϕ(a) N ϕ(x i + h i ) ϕ(x i ) + N+1 j=1 ϕ(ξ j) ϕ(ξ j ) ε(β a) + ε. Since this inequality holds for any ε > 0 we conclude that ϕ(β) = ϕ(a), so the function ϕ is constant. This proves the lemma and also completes the proof of the theorem. Theorem 108 (Integration by parts) If the functions f, g : [a, b] IR are absolutely continuous, then 16 Because d dt b fg = fg b a b f g. a a x a F (t) dt = F (x) a.e. by the Lebesgue differentiation theorem. 78

Proof. Clearly fg = (fg) f g a.e. and absolute continuity of fg implies b fg = b (fg) b f g = fg b a b a a a a f g. Theorem 109 very function of bounded variation is a sum of an absolutely continuous function and a singular function, i.e. such a function that its derivative equals 0 a.e. Proof. The theorem follows immediately from the equality f(x) = ( f(x) x a ) f (t) dt + x a f (t) dt. xample. very Lipschitz function f : [a, b] IR is absolutely continuous and hence it is differentiable a.e. and satisfies the claim of the fundamental theorem of calculus. Definition. For a function f : [a, b] IR, the Banach indicatrix is defined by N(f, y) = #f 1 (y) for y IR, i.e. N(f, y) is the number of points in the preimage of y (it can be equal to 0 or to for some y). Theorem 110 (Banach) For a continuous function f : [a, b] IR its indicatrix is measurable and b f = N(f, y) dy. (53) a IR Remark. We do not assume that the function f has bounded variation, so if the function f has unbounded variation, then the theorem says that the integral of the function on the right hand side of (53) equals. Proof. Under construction. Theorem 111 If f : [a, b] IR is absolutely continuous, then b f = b a a f (t) dt. 79

Proof. Under construction. The above two theorems immediately imply the following result. Corollary 112 If f : [a, b] IR is absolutely continuous, then b a f (t) dt = IR N(f, y) dy. Theorem 113 (Rademacher) If f : Ω IR is Lipschitz continuous, where Ω IR n is open, then f is differentiable a.e., i.e. f f(x) = (x),..., f (x) x 1 x n exists a.e. and f(y) f(x) f(x) (y x) lim y x y x = 0 for a.e. x Ω. Proof. Let ν S n 1 and D ν f(x) = d dt f(x + tν) t=0 be the directional derivative whenever it exists. Since D ν f(x) exists precisely when the Borel-measurable functions lim sup t 0 f(x + tν) f(x) t and lim inf t 0 f(x + tν) f(x) t coincide, the set A ν on which D ν f fails to exist is Borel-measurable. The function t f(x+tν) is absolutely continuous and hence differentiable a.e. Thus the intersection of the set A ν with any line parallel to ν has one dimensional measure zero and hence Fubini s theorem implies that A ν has measure zero. Accordingly, for every ν S n 1 D ν f(x) exists for a.e. x Ω Take any ϕ C0 (Ω) and note that for each sufficiently small h > 0 f(x + hν) f(x) ϕ(x hν) ϕ(x) ϕ(x) dx = f(x) dx. h h Ω Indeed, invariance of the integral with respect to translations yields f(x + hν)ϕ(x) dx = f(x)ϕ(x hν) dx Ω Ω Ω 80

and h has to be so small that x+hν Ω for all x supp ϕ. The dominated convergence theorem yields D ν f(x)ϕ(x) dx = f(x)d ν ϕ(x) dx. Ω Ω This is true for any ν S n 1 and in particular we have f (x) ϕ(x) dx = f(x) ϕ (x) dx for i = 1, 2,..., n. x i x i Ω Now D ν f(x)ϕ(x) dx = Ω Ω n = = Ω Ω f(x)d ν ϕ(x) dx = Ω Ω f(x) ϕ x i (x) ν i dx = ϕ(x)( f(x) ν) dx. f(x) ( ϕ(x) ν) dx n Ω ϕ(x) f x i (x) ν i dx Lemma 114 If g L 1 loc (Ω) and Ω g(x)ϕ(x) dx = 0 for every ϕ C 0 (Ω), then g = 0 a.e. We leave the proof of this lemma as an exercise. The lemma implies that D ν f(x) = f(x) ν a.e. Now let ν 1, ν 2,... be a countable dense subset of S n 1 and let A k = {x Ω : f(x), D νk f(x) exists and D νk f(x) = f(x) ν k }. Let A = k=1 A k. Clearly Ω \ A = 0 and D νk f(x) = f(x) ν k for all x A and all k = 1, 2,... We will prove that f is differentiable at each point of the set A. To this end, for any x A, ν S n 1 and h > 0 define Q(x, ν, h) = f(x + hν) f(x) h f(x) ν and it suffices to prove that if x A, then for every ε > 0 there is δ > 0 such that Q(x, ν, h) < ε whenever 0 < h < δ and ν S n 1. Let the function f be L-Lipschitz, i.e. f(x) f(y) L x y for all x, y Ω. This implies that f/ x i L a.e. and thus f(x) nl a.e. Hence for any x A, ν, ν S n 1 and h > 0 Q(x, ν, h) Q(x, ν, h) ( n + 1)L ν ν. 81

Given ε > 0 let p be so large that for each ν S n 1 Since ν ν k < ε 2( n + 1)L for some k = 1, 2,..., p. lim Q(x, ν i, h) = 0 for all x A and all i = 1, 2,... h 0 + we see that for a given x A there is δ > 0 such that Now Q(x, ν i, h) < ε/2 whenever 0 < h < δ and i = 1, 2,..., p. Q(x, ν, h) Q(x, ν k, h) + Q(x, ν k, h) Q(x, ν, h) < ε/2 + ( n + 1)L ν k ν < ε whenever ν S n 1 and 0 < h < δ and the theorem follows. Theorem 115 (McShane) If f : A IR is an L-Lipschitz function defined on a subset A of a metric space, then there is an L-Lipschitz function f : IR such that f A = f. Proof. For x we define f(x) = inf (f(y) + Ld(x, y)) y A and it a a routine exercise to check that f is L-Lipschitz and satisfies f A = f. Theorem 116 (Stepanov) Let Ω IR n be open. Then a measurable function f : Ω IR is differentiable a.e. in Ω if and only if lim sup y x f(f) f(x) y x < for a.e. x Ω. (54) The implication is obvious. To prove the implication first we show that (54) implies that f is Lipschitz continuous on a big set A in the sense that the measure of Ω \ A is small. Then we extend f A to a Lipschitz function on IR n using McShane s theorem. Now f is differentiable a.e. by Rademacher theorem. In particular it is differentiable at a.e. point of A and it remains to prove that f is differentiable at every density point of A at which f is differentiable. We leave details as an exercise. 82

9 Signed measures Definition. Let M be a σ-algebra. A function µ : M [, ] is called signed measure if 1. µ( ) = 0; 2. µ attains at most one of the values ± ; 3. µ is countably additive. xamples. 1. µ = µ 1 µ 2, where µ 1 and µ 2 are (positive) measures and one of them is finite. 2. µ() = f dν, f L 1 (ν). Definition. A measurable set M is called positive if µ(a) > 0 for every measurable subset A. xample. If = 1 2, 1 2 =, µ( 1 ) = 7, µ( 2 ) = 3, then µ() = 4 > 0, but is not positive. Lemma 117 If µ is a signed measure, then every measurable set such that 0 < µ() < contains a positive set A of positive measure. Proof. If is positive, then we take A =. Otherwise there is B, µ(b) < 0. Let n 1 be the smallest positive integer such that there is B 1 with µ(b 1 ) 1 n 1. If A 1 = \ B 1 is positive, then we take A = A 1 (observe that µ(a 1 ) > 0 as otherwise µ() < 0). If A 1 is not positive, then let n 2 be the smallest positive integer such that there is B 2 \ B 1 with µ(b 2 ) 1 n 2. If A 2 = \(B 1 B 2 ) is positive, then we take A = A 2 (observe that µ(a 2 ) > 0). If A 2 is not positive, then we define A 3 as above and so on. If A m = \ m j=1 B j is positive for some m, then we take A = A m (µ(a m ) > 0). Otherwise we obtain an infinite sequence B j and we set A = \ B j. 83 j=1

We will prove that A is positive and µ(a) > 0. We have and hence IR µ(a) = µ() 0 < µ() = µ(a) + j=1 µ(b j ) j=1 µ(b j ) µ(a) j=1 1 n j=1 j 1 n j < µ(a). (55) Note that µ(a) < as otherwise µ() = µ(a) + µ( \ A) = + µ( \ A) =. Hence (55) yields µ(a) > 0 and j=1 1/n j <. In particular n j ad j. It remains to prove that A is positive. If C A = \ j=1 B j is measurable, the for every m, C \ m j=1 B j and the definition of n m+1 yields 1 µ(c) > n m+1 1 0 and hence µ(c) 0. This proves positivity of A. as m Theorem 118 (Hahn s decomposition theorem) If µ is a signed measure, then there exist two disjoint measurable sets + and such that = +, + =, µ is positive on + and µ is negative on (i.e. µ is positive on ). Therefore every signed measure is on the form µ = µ + µ where µ + and µ are positive measures defined by µ + () = µ( + ), µ () = µ( ). Moreover the measures µ + and µ are concentrated on disjoint sets + and. xercise. Prove that the decomposition = + is unique up to sets of µ-measure zero, i.e. if = + is another decomposition, then µ( + \ + ) = µ( + \ + ) = µ( \ ) = µ( \ ) = 0. xample. Let µ() = f dν, where f L1 (ν). Then µ is a signed measure and µ() = f + dν f dν. 84

Hence we can take + = {x : f(x) 0}, = {x : f(x) < 0}, but we also can take + = {x : f(x) > 0}, = {x : f(x) 0}. In general + +,, however, the sets differ by the set where f equals 0 and this set has measure zero µ({x : f(x) = 0}) = f dν = 0. {f=0} Proof of Hahn s theorem. Suppose µ does not attain value +. Let M = sup{µ(a) : A M, A is positive}. Then there is a sequence of positive sets A 1 A 2 A 3... ; µ(a i ) M and if A = A i, then A is positive with µ(a) = M <. It remains to prove that \ A is negative (then we take + = A, = \ A). For if not, there is \ A with 0 < µ() <. Hence Lemma 117 implies that there is a positive subset C of positive measure. Now A C is positive and which is an obvious contradiction. µ(a C) = µ(a) + µ(c) > M As an application we will prove the Radon-Nikodym-Lebesgue theorem and then we will classify all continuous linear functionals on L p (µ) for 1 p <. Definition. Let µ be a (positive) measure in a σ-algebra M. If ν is another (positive) measure in M such that M, µ() = 0 ν() = 0, then we say that ν is absolutely continuous with respect to µ and we write ν µ. xample. If f 0 is measurable, and a measure ν is defined by ν() = f dµ, 85

then ν µ. Definition. Let λ be a measure in M. We say that λ is concentrated on A M if λ() = λ(a ) for all M. quivalently λ is concentrated on A M if λ( \ A) = 0. Definition. Let λ 1, λ 2 be two measures in M. If there are sets such that A, B M, A B = λ 1 is concentrated on A, λ 2 is concentrated on B, then we say that the measures λ 1 and λ 2 are mutually singular and we write λ 1 λ 2. Lemma 119 Let λ 1, λ 2 be two measures in M. If there is a set M such that λ 1 () = 0 and λ 2 is concentrated on, then λ 1 λ 2. Proof. It suffices to take A = \ and B =. xamples. 1. For a IR n the Dirac mass is a Borel measure defined by δ a () = 1 if a and δ a () = 0 if a. The lemma yields L n δ a. 2. If M is a k-dimensional submanifold in IR n, k < n, and λ() = H k ( M) for IR n, then L n λ. 3. if C is the standard ternary Cantor set and λ() = H log 2/ log 3 ( C) for IR, then λ L 1. Theorem 120 (Radon-Nikodym-Lebesgue) Let µ and ν be two (positive), σ-finite measures on M. Then (a) There is a unique pair of measures λ a, λ s on M such that λ = λ a + λ s, λ a µ, λ s µ. (b) There is a M-measurable function h 0 on such that λ a () = h dµ for all M. If in addition λ is finite (i.e. λ() < ), then the measures λ a, λ s are finite and h L 1 (µ). 86

Remark. Part (a) is due to Lebesgue and part (b) due to Radon and Nikodym. In particular if λ µ, then λ() = h dµ (56) for some h 0 and all M which is a striking characterization of all absolutely continuous measures. The function h is called Radon-Nikodym derivative and is often denoted by h = dλ/dµ. With this notation (56) can be rewritten as λ() = dλ dµ for all M. dµ Proposition 121 If λ is a finite measure, absolutely continuous with respect to the Lebesgue measure, λ L n, then dλ λ(b(x, r)) = lim dl n r 0 + B(x, r) for a.e. x IR n. Proof. It follows from the Radon-Nikodym theorem that dλ/dl n L 1 (IR n ) and λ(b(x, r)) dλ = dl n dλ (x) as r 0 + B(x, r) dl n dl n B(x,r) for a.e. x IR n by the Lebesgue differentiation theorem. Proof of the Radon-Nikodym-Lebesgue theorem. First we will prove the second part of the theorem which is due to Radon and Nikodym. We can state it as follows. Theorem 122 (Radon-Nikodym) If λ and µ are σ-finite measures on a σ-algebra M and λ µ, then there is a nonnegative measurable function h : [0, ] such that λ() = h dµ for all M. If in addition λ is finite, then h L 1 (µ). Proof. We will prove the theorem under the additional assumption that µ() < and λ() <. The general case of σ-finite measures easily follows from this special case and the details are left to the reader. Let Φ = {ϕ : [0, ] : ϕ is measurable and ϕ dµ λ() for all M}. Clearly Φ, because ϕ 0 Φ. If ϕ 1, ϕ 2 Φ, then max{ϕ 1, ϕ 2 } Φ. Indeed, max{ϕ 1, ϕ 2 } dµ = ϕ 1 dµ + ϕ 2 dµ {ϕ 1 ϕ 2 } {ϕ 1 <ϕ 2 } λ( {ϕ 1 ϕ 2 }) + λ( {ϕ 1 < ϕ 2 }) = λ(). 87

Define M = sup ϕ dµ λ() <. ϕ Φ There is a sequence ϕ n Φ such that ϕ n dµ M as n. Let h n = max{ϕ 1, ϕ 2,..., ϕ n } Φ. Then h n is an increasing sequence of functions pointwise convergent to a function h. Since h n dµ λ() for all M it follows from the monotone convergence theorem that h dµ λ() for all M. Hence h Φ. Moreover h dµ = lim h n dµ = M. n We will show that h dµ = λ() for all M. Clearly η() = λ() defines a positive measure in M and we need to show that η 0. For if not there would exist a set A M such that η(a) > 0. This implies that λ(a) > 0 and hence µ(a) > 0 (because λ µ). Thus there is ε > 0 such that η(a) εµ(a) > 0. Define ξ = η εµ. Then ξ is a signed measure and we can assume that A is positive with respect to ξ by Lemma 117. Hence if M, ξ(a ) 0 and thus 0 ξ(a ) = η(a ) εµ(a ) = λ(a ) h dµ εµ(a ), but also A h dµ A h dµ + εµ(a ) λ(a ), \A h dµ λ( \ A) because h Φ. Adding up the two inequalities yields (h + εχ A ) dµ = h dµ + εµ(a ) λ(). This implies that h + εχ A Φ. Since (h + εχ A ) dµ = M + εµ(a) > M, we get a contradiction. The proof is complete. Now we can prove the first part of the theorem which is due to Lebesgue. 88

Theorem 123 (Lebesgue) If λ and µ are σ-finite measures, then there exist unique measures λ a and λ s such that λ = λ a + λ s, λ a µ, λ s µ. Proof. We leave the proof of the uniqueness as an exercise and so we are left with the proof of the existence of λ a and λ s. Let ν = µ + λ. Then µ ν and λ ν. By the Radon-Nikodym theorem µ() = f dν, λ() = g dν for some measurable functions f, g : [0, ] and all M. Define λ 0 () = λ( {f = 0}), λ 1 () = λ( {f > 0}). Since λ = λ 0 + λ 1 it suffices to prove that λ 0 µ and λ 1 µ. The measure λ 0 is concentrated on the set {f = 0}. Since µ({f = 0}) = 0, we conclude that λ 0 µ. To prove that λ 1 µ let µ() = 0. Then f = 0, ν-a.e. in the set, i.e. ν( {f > 0}) = 0 and thus λ 1 () = g dν = 0. {f>0} We will apply now the Radon-Nikodym theorem to classify all continuous linear functionals on L p (µ) for 1 p <. Let us start first with some general facts about continuous linear mappings between normed spaces. Theorem 124 Let L : Y be a linear mapping between normed spaces. Then the following conditinos are equivalent (a) L is continuous; (b) L is continuous at 0; (c) L is bounded, i.e. there is C > 0 such that Lx C x for all x. 89

Remark. Formally we should use different symbols to denote norms in spaces and Y. Since it will always be clear in which space we take the norm it is not dangerous to use the same symbol to denote apparently different norms in and Y. Proof. The implication (a) (b) is obvious. (b) (c). Suppose L is continuous at 0 but not bounded. Then there is a sequence x n such that Lx n n x n. Hence which is a contradiction 17. (c) (a). Let x n x. Then 0 L x n 1 n x n Lx Lx n = L(x x n ) C x x n 0 and hence Lx n Lx which proves continuity of L. In particular if is a normed space over IR (or C), then a linear functional Λ : IR (or C) is continuous if and only if it is bounded, i.e. there is C > 0 such that Λx C x for all x. (57) The number is called the norm of Λ. Λ = sup Λx x 1 Lemma 125 The norm Λ is the smallest number C for which the inequality (57) is satisfied. Proof. Suppose Λx C x for all x. If x 1, then Λx C x C and hence Λ = sup Λx C. x 1 Thus the number C is (57) cannot be smaller than Λ. It remains to prove that Λx Λ x for all x. 17 Convergence to 0 follows from the continuity of L at 0. 90

If x = 0, then the inequality is obvious. If x 0, then Λx = Λ x x x Λ x. The last inequality follows from the fact that x/ x = 1. xample. If 1/p + 1/q = 1, 1 < p, q < are Hölder conjugate, and g L q, then Λ g f = fg dµ defines a bounded linear functional on L p (µ) and Λ g g q. inequality yields Λ g f = fg dµ g q f p. Similarly if g L (µ), then Λ g f = fg dµ defines a bounded linear functional on L 1 (µ) and Λ g g. Indeed, Hölder s It turns out that under reasonable assumptions every functional on L p (µ), 1 p < is on the form Λ g for some g L q (or g L if p = 1). Theorem 126 Suppose 1 p < and µ is a σ-finite measure. Assume that Λ is a bounded linear functional on L p (µ). Then there is a unique g L q (µ), 1/p + 1/q = 1 such that Λf = fg dµ for all f L p (µ). Moreover Λ = g q. Proof. Uniqueness. Suppose Λf = fg 1 dµ = fg 2 dµ for some g 1, g 2 L q (µ) and all f L p (µ). We need to show that g 1 = g 2 a.e. For M with µ() < we have g 1 dµ = Λ(χ ) = g 2 dµ. Hence (g 1 g 2 ) dµ = 0 91

for every M with µ() <. This easily implies that g 1 g 2 = 0 a.e. and hence g 1 = g 2 µ-a.e. We will prove the theorem under the additional assumption that µ() <. The general case of σ-finite measure can be reduced to the case of finite measure and we leave details to the reader. Define It is obvious that λ is finitely additive Indeed, λ() = Λ(χ ). λ(a B) = λ(a) + λ(b) for A, B M, A B =. λ(a B) = Λ(χ A B ) = Λ(χ A + χ B ) = Λ(χ A ) + Λ(χ B ) = λ(a) + λ(b). We will use this fact to prove that actually λ is conuntably additive. Let = i, where i M and i j = for i j. Let A k = 1... k. We have χ χ Ak p = µ( \ A k ) 1/p 0 as k, because the sets \ A k form a decreasing sequence of sets with empty intersection and µ( \ A 1 ) µ() <. Continuity of the functional λ implies and hence finite additivity of λ yields Accordingly Λχ Ak Λχ k λ( i ) = λ(a k ) λ() as k. λ( i ) = λ() which is countable additivity of λ. Note that λ is not necessarily positive, so λ : M IR is a signed measure. By the Hahn decomposition theorem λ = λ + λ, where λ + and λ are positive finite measures concentrated on disjoint sets. Clearly µ() = 0 implies that λ() = 0 and also λ + () = λ () = 0. Hence λ +, λ µ. The Radon-Nikodym theorem yields the existence of 0 g ± L 1 (µ) such that λ ± () = g ± dµ. 92

Now λ() = We can write the last statement as Λ(χ ) = g dµ where g = g + g L 1 (µ). χ g dµ. Since simple functions are linear combinations of characteristic functions Λf = fg dµ (58) whenever f is a simple function. Now every f L (µ) is a uniform limit of a sequence of simple functions and hence (58) holds for all f L (µ). It remains to prove the following three statements (a) g L q (µ) (we know that g L 1 (µ)); (b) Λ = g q (we know that Λ g q by Hölder s inequality); (c) Λf = fg dµ for all f Lp (µ) (we know it for f L (µ)). We will complete the proof in the case 1 < p <. The case p = 1 is easier and left to the reader. Let { g(x)/ g(x) if g(x) 0, α(x) = 0 if g(x) = 0, and Define n = {x : g(x) n}. f = g q/p αχ n. Clearly f L (µ) (because f n q/p ). Moreover Hence g q dµ by (b) = n f p (a) = g q χ n (b) = fg. ( ) 1/p fg dµ f L by (a) = Λf Λ f p = Λ g q dµ. n This yields ( ) 1 1/p g q dµ Λ. n 93

The left hand side converges to g q as n and hence g q Λ. Since we already know the opposite inequality Λ g q we conclude Λ = g q. This yields the claims (a) and (b). Now (c) follows from the continuity of the functional. Indeed, there is a sequence 18 L f k f in L p and hence Λf fg dµ Λf Λf k + Λf k fg dµ = Λf Λf k + (f k f)g dµ which proves (c). Λf Λf k + f k f p g q 0 as k. Definition. Let be a locally compact metric space. The class of continuous functions vanishing at infinity C 0 () consists of all continuous functions u : IR such that for every ε > 0 there is a compact set K such that u(x) < ε for all x \K. xercise. Prove that u C 0 (IR n ) if and only if u is continuous on IR n and lim x u(x) = 0. xample. The set of positive integers IN is a locally compact metric space with respect to the metric x y. Functions on IN can be identified with infinite sequences. Indeed, a sequence (x i ) corresponds to a function f(i) = x i. very function on IN is continuous and compact sets in IN are exactly finite subsets. It easily follows that C 0 (IN) consists of sequences such that x i 0 as i. This space is denoted by c 0 = C 0 (IN). Thus c 0 = {(x i ) : x i IR, x i 0 as i }. It is easy to see that c 0 is a Banach space with respect to the norm x = sup,2,... x i, where x = (x i ). Therefore c 0 is a closed subspace in l. Recall that C c () consists of all continuous functions on that have compact support. Clearly C c () C 0 (). Theorem 127 Let be a locally compact metric space. space with the norm u = sup u(x). x Moreover C c () is a dense subset of C 0 (). Then C 0 () is a Banach We leave the proof as an exercise. Observe that if µ is a measure on, then C 0 () is a closed subspace of L (µ). Hence C 0 () is the closure of C c () in the metric of L (µ). If µ is a Radon measure on 18 Because simple functions are dense in L p. 94

a locally compact metric space, then the closure of C c () in the metric of L p (µ) equals L p (µ) for all 1 p < (Theorem 86). However the closure of C c () in the metric of L (µ) equals C 0 () L (µ). We managed to characterize bounded linear functional on L p (µ) for all 1 p < (Theorem 126). It is not so easy to characterize bounded linear functionals on L (µ). However Theorem 128 below provides a characterization of bounded linear functionals on C 0 (). Recall that if µ is a signed measure, then Hahn s decomposition theorem provides a unique representation µ = µ + µ, where µ + and µ are positive measures concentrated on disjoint sets. We define the measure µ by µ = µ + +µ and call the number µ () the total variation of µ. then If a signed measure µ is given by µ() = f dλ for some f L 1 (λ), µ () = f dλ. If µ is a signed measure and f L 1 ( µ ), then we define fdµ = f dµ + f dµ. Theorem 128 (Riesz Representation Theorem) Let be a locally compact metric space and let Λ be a bounded linear functional on C 0 (). Then there is a unique Borel signed measure µ of finite total variation such that Λf = f dµ for all f C 0 (). (59) Moreover Λ = µ (). In the proof we will need the following abstract and general result. Lemma 129 Let 0 be a dense linear subspace in a normed space. If Λ : 0 IR is a bounded linear functional (with respect to the norm in ), then there is a unique functional Λ : IR (called extension of Λ) such that Λx = Λx for all x 0 and Λ = Λ. Remarks. Here Λ = sup x 1 x Λx and Λ = sup Λx. x 0 x 1 95

Usually the extension is denoted by the same symbol Λ rather than Λ. xample. Let be a locally compact metric space and λ a Radon measure on. If Λ : C c () IR is a linear functional such that Λf M f L 1 (λ) for all f C c (), then there is unique bounded linear functional Λ : L 1 (λ) IR such that Λf = Λf for f C c () and Λf M f L 1 (λ) for all f L 1 (λ). Indeed, C c () is a linear and dense subspace of L 1 (λ) by Theorem 86. Proof of the lemma. Let 0 x n x. Then we define and we have to prove that: (a) the limit exists; Λx = lim n Λx n (60) (b) the limit does not depend on the particular choice of a sequence x n 0 that converges to x; (c) Λx = Λx for x 0 ; (d) Λ = Λ. If x n x, x n 0, then Λx n Λx m = Λ(x n x m ) Λ x n x m 0 as n, m, which shows that Λx n is a Cauchy sequence of real numbers and hence it is convergent. This proves (a). If x n x, y n x, x n, y n 0, then Λx n Λy n = Λ(x n y n ) Λ x n y n 0 as n which proves (b). If x 0, then taking x n = x we have x n x and hence Λx = lim n Λx = Λx which proves (c). Clearly Λ = sup x 1 x 0 Λx sup Λx = Λ, x 1 x but we actually have equality because if x, then there is a sequence x n 0, x n x and hence Λx = lim n Λx n Λ lim n x n = Λ x. 96

which proves that Λ Λ (see Lemma 125). The above two inequalities prove (d). Remark. In the proof we used the fact that every Cauchy sequence in IR is convergent, i.e. we used the fact that IR is Banach space. Therefore the same proof gives a more general result about extensions of bounded linear mappings L : 0 Y defined on a dense linear subspace 0 of a noremd space into a Banach space Y. The reader should find an appropriate statement. Proof of Theorem 128. First we prove uniqueness of the measure µ. To this end we need to prove that if µ is a signed Borel measure of finite total variation such that f dµ = 0 for all f C 0 (), then µ 0. Let µ = µ + µ be the Hahn decomposition with measures µ + and µ concentrated on disjoint sets + and respectively. If { 1 if x h(x) = +, 1 if x, then h 2 = 1 and dµ = hd µ. Hence µ () = d µ = h 2 d µ. (61) Let f n C c () be a sequence that converges to h in L 1 ( µ ) (Theorem 86). Note that 0 = f n dµ = f n h d µ. (62) Now (61) and (62) yield µ () = (h 2 f n h) d µ = Hence µ () = 0 and thus µ = 0. (h f n )h d µ h f n d µ 0 as n. Now let Λ be a bounded linear functional on C 0 (). We can assume without lost of generality that 19 Λ = 1. We would like to apply the Riesz representation theorem for positive functionals on C c (), but unfortunately the functional Λ is not necessarily positive. So we want to construct a positive functional Φ such that Λf Φ( f ) f for all f C c (). (63) Once the functional Φ is constructed we can finish the proof of the theorem as follows. By the Riesz representation theorem there is a (positive) Radon measure λ such that Φ(f) = f dλ for all f C c (). 19 Otherwise we multiply Λ by Λ 1. 97

Observe that λ() = sup{φ(f) : 0 f 1, f C c ()}. (64) Indeed, for 0 f 1, f C c () we have f dλ λ(). On the other hand for every compact set K there is 0 f 1, f C c () such that f = 1 on K and hence f dλ λ(k). Now (64) follows from the fact that Observe that (64) implies λ() = It follows from (63) that Λ(f) Φ( f ) = sup λ(k). K K compact λ() 1. (65) f dλ = f L 1 (λ). Thus Λ is a bounded linear functional defined on a linear subspace 20 C c () L 1 (λ). Hence the functional can be uniquely extended to a bounded linear functional Λ on L 1 (λ) such that Λf = Λf for f C c () and Λf f L 1 (λ) for all f L 1 (λ). According to Theorem 126 there is a unique function g L (λ), g 1 such that Λf = fg dλ for all f L 1 (λ). If f C 0 () and f n C c (), f n f in the norm of C 0 (), then Λf = lim f n g dλ = Λf n Λf n and hence Λf = Λf for all f C 0 (). Thus Λf = fg dλ for all f C 0 (). Accordingly, we have the representation (59) with dµ = gdλ. Since Λ = 1 and d µ = g dλ it follows that µ () = g dλ sup{ Λf : f C 0 (), f 1} = Λ = 1. On the other hand g 1 and λ() 1 so µ () = g dλ 1. 20 Bounded with respect to the norm of L 1 (λ). 98

Thus µ () = 1 = Λ and the proof is complete. Therefore we are left with the construction of the positive functional Φ satisfying (63). Denote by C + c () the class of nonnegative functions in C c () and for f C + c () define Φ(f) = sup{ Λh : h C c (), h f}. If f C c (), then f = f + f, f +, f C + c () and we define Φ(f) = Φ(f + ) Φ(f ). Clearly Φ(f) 0 for f C + c (), so Φ is positive and Φ satisfies (63). Therefore we only need to show that Φ is linear. Obviously Φ(cf) = cφ(f) and hence we are left with the proof that Φ(f + g) = Φ(f) + Φ(g) for all f, g C c (). (66) Observe that it suffices to verify (66) for f, g C + c (). Indeed, for f, g C c () we will have then 21 (f + g) + (f + g) = f + f + g + g and hence (f + g) + + f + g = (f + g) + f + + g +. Now, the assumed linearity of Φ on C c () + yields Φ((f + g) + ) + Φ(f ) + Φ(g ) = Φ((f + g) ) + Φ(f + ) + Φ(g + ) Φ((f + g) + ) Φ((f + g) ) }{{} Φ(f+g) = Φ(f + ) Φ(f ) }{{} Φ(f) + Φ(g + ) Φ(g ). }{{} Φ(g) Thus we are left with the proof of (66) for f, g C c () +. Fix f, g C + c (). Given ε > 0, there exist h 1, h 2 C c () such that h 1 f, h 2 g and Φ(f) Λh 1 + ε/2, Φ(g) Λh 2 + ε/2. Let c 1 = ±1, c 2 = ±1 be such that Λh 1 = c 1 Λh 1, Λh 2 = c 2 Λh 2. We have Φ(f) + Φ(g) Λh 1 + Λh 2 + ε = Λ(c 1 h 1 + c 2 h 2 ) + ε Φ( h 1 + h 2 ) + ε Φ(f + g) + ε and hence Φ(f) + Φ(g) Φ(f + g). 21 Because both sides are equal f + g. 99

Now we need to prove the opposite inequality. Let h C c () satisfy h f + g, let V = {x : f(x) + g(x) > 0}, and define h 1 (x) = { f(x)h(x) f(x)+g(x) if x V, 0 if x V, h 2 (x) = { g(x)h(x) f(x)+g(x) if x V, 0 if x V. It is easy to see that the functions h 1 and h 2 are continuous, h = h 1 + h 2, and h 1 f, h 2 g. We have Λh = Λh 1 + Λh 2 Λh 1 + Λh 2 Φ(f) + Φ(g) and after taking the supremum over all functions h we have Φ(f + g) Φ(f) + Φ(g) which completes the proof. 10 Maximal function Definition. For a function u L 1 loc (IRn ) we define the Hardy-Littlewood maximal function by Mu(x) = sup u(y) dy. r>0 B(x,r) Clearly Mu : IR n [0, ] is a measurable function. Theorem 130 (Mardy-Littlewood maximal theorem) If u L p (IR n ), 1 p, then Mu < a.e. Moreover (a) If u L 1 (IR n ), then for every t > 0 {x : Mu(x) > t} C(n) t (b) If u L p (IR n ), 1 < p, then Mu L p (IR n ) and Remarks. Let u = χ [0,1]. Then for x 1 Mu(x) 1 2x IR n u. (67) Mu p C(n, p) u p. (68) 2x 0 χ [0,1] (y) dy = 1 2x L1. Hence the inequality (68) does not hold for p = 1. If g L 1 (IR n ), then t {x : g(x) > t} = IR n 100