Two classic theorems from number theory: The Prime Number Theorem and Dirichlet s Theorem

Transcription

1 Two classic theorems from number theory: The Prime Number Theorem and Dirichlet s Theorem Senior Exercise in Mathematics Lee Kennard 5 November, 2006

2 Contents 0 Notes and Notation 3 Introduction 4 2 Primes in the odd integers 5 2. The infinitude of rimes The Prime Number Theorem History of the Prime Number Theorem An analytic theorem Newman s roof of the Prime Number Theorem Primes in arithmetic rogressions Dirichlet s Theorem Euler s roof for q = Dirichlet characters and L-functions (rime modulus Dirichlet s roof (rime modulus Dirichlet characters and L-functions (general modulus Dirichlet s roof (general modulus A robabilistic interretation of Dirichlet s theorem The Generalized Prime Number Theorem A Aendix 49 A. Comlex Integration and Differentiation A.. Comlex Integration A..2 Comlex Differentiation A.2 The Leibniz Integral Rule A.3 Useful algebraic concets and facts A.3. The multilicative grou Z α is cyclic for all odd rime A.3.2 The multilicative grou Z 2α for n 3 is generated by 5 and A.3.3 The Legendre Symbol A.4 Carefully considered convergence issues A.4. Infinite roducts A.4.2 Combining absolutely convergent series A.4.3 The Dirichlet test for convergence A.4.4 The Taylor series for log( z converges for all z such that z

3 0 Notes and Notation Proving the Prime Number Theorem (PNT might have been sufficient for a senior exercise; it is a beautiful fact which is nontrivial to rove. I had thought about Dirichlet s theorem on rimes in arithmetic rogressions as well, but I thought that both would be too much, so my real lan was to do one or the other. Once I finished the roof of the PNT, however, I had some momentum, and I was having fun. Sure, I had learned about Abel s Lemma, the Dirichlet test for convergence, and the Leibniz integral rule, but all of these seemed to me disconnected tools, used together here only because they haened to be necessary for the roof. I longed for unity or, as least, something which I could add to make this exercise a whole. Since the Dirichlet theorem interested me, here is what I came u with. I asked the questions Do there exist infinitely many rimes (in the set of natural numbers N or in arithmetic rogressions in N and, if so, how are these rimes (in N or in these arithmetic rogressions distributed that is, how dense are they in N? Euclid and, later, Euler answered the first half of the first question; Dirichlet answered the second half; the Prime Number Theorem (PNT answered the first half of the second question; and the generalized PNT answered the second half. The answers to these four questions form the outline of this exercise. Along the way, we will do some analytic number theory: we will define the Riemann ζ-function and the Dirichlet L-functions; we will do some comlex analysis: we will use the notion of comactness several times (thank you Professor Schumacher, continuously deform contours of integration (thank you Professor Holdener, and analytically continue functions; we will do some abstract algebra: we will use Lagrange s theorem multile times (thank you Professor Aydin, rove that Z α is cyclic for odd rimes, and define Dirichlet characters on Z q for all ositive integers q; and, at one oint, we will even do some robability theory (thank you Professor Hartlaub. 2 The following notation will be used throughout the aer: N the set of ositive integers, 2, 3,... Z the set of integers...,, 0,,... R the set of real numbers C the set of comlex numbers Z q the ring Z/qZ Z q the multilicative grou of invertible elements of Z q R(s denotes the real art of s C I(s denotes the imaginary art of s C φ(n the Euler totient function ζ(s the Riemann zeta function (defined below In the words of John B. Fraleigh, author of A first course in abstract algebra: Never underestimate the ower of a theorem which counts something! 2 I also want to thank Professor Milnikel and Professor Brown; a large ortion of what I understand is understood because I understand linear algebra. 3

4 Introduction Prime numbers are the building blocks of the natural numbers, just as atoms are the building blocks of molecules. Though we cannot understand everything about the natural numbers via meditations on the rimes, we can surely learn a lot! A rime natural number is an irreducible number. That is, the only divisors of a rime are and itself. 3 Also true is the fact that if a rime number dividing a roduct ab of two integers a and b, then either divides a or divides b. This fact can be used to show that the factorization of natural numbers into rimes is unique. Since every natural number can be factored into rimes (using the fact that the quotient obtained by divided a number by a rime is strictly less than the number, we conclude that, for all natural numbers n >, there exists a unique set of rimes < 2 < < r and ositive integers a,..., a r such that n = r k= Here the nature of rimes as building blocks is clear. In this exercise, we shall take interest in the distribution of rimes in the natural numbers. We ask: How many rimes are there? About how many rimes are there less than a given x? How many rimes are there of the form a+bn where a and b are integers? And finally: About how many rimes of the form a + bn are there less than a given x? These four questions form the skeleton of this exercise. a k k. 3 We do not count as a rime for a few reasons, the most imortant of which is to reserve unique factorization. 4

5 2 Primes in the odd integers Since we do not count as a rime, and since we do not count 0 as a natural number, we might ask if 2 is rime. This is obvious, since and 2 divide 2, and there are no other natural numbers less than 2 which could divide it. Therefore 2 is the smallest rime. In fact, it is the only even rime. For if n is an even natural number besides two, then n = 2m for some natural number m >, and this imlies that (at least three natural numbers namely,, 2, and m all divide n. Two is the smallest rime; two is the only even rime; and many ways, two is the oddest of all rimes. Because two is so secial, we choose to ignore it throughout most of this aer. So we start by studying the distribution of rimes in the odd integers. 2. The infinitude of rimes From the definition, we see also that 3, 5, 7,, and 3 are all rime and that (by skiing to trile-digit numbers 0, 03, and 07 are all rime. If we work a little harder, we would find that 009, 03, 09, and 02 are rime. In fact, no matter what ower of 0 we go u to, we will always be able to find another rime. The reason: There exist infinitely many rimes. One incredible roof of this was given by Euclid. I might say that no senior exercise on the distribution of rimes would be comlete if it lacked this roof. It makes the non-mathematician smile; it makes us squeal. The roof goes as follows: Suose there are only finitely many rimes. Denote them by, 2,..., n. Define the natural number N by N = n k +. k= As we mentioned (but did not rove above, we can write N = n k= where the a k are nonnegative integers. 4 But then if a k > 0 for some k, then k divides both N and N, which means that k =, a contradiction. If a k = 0 for all k, then N =, contradicting the fact that N + >. Therefore, there must exist infinitely many rimes. Like all great mathematical theorems, this one has multile roofs. A second roof is in order for two reasons. First, it will rove a stronger fact from which the infinitude of rimes follows immediately. Second, it will introduce us to the Riemann zeta function, a fundamental function in analytic number theory. But before we get to this, we rove an imortant result which we will use throughout the aer: Theorem. Let χ : N C be a comletely multilicative 5 bounded function, and (formally define L(s, χ for all s C by χ(n L(s, χ = n s. 4 Before we secified that each a k > 0, but we allow some a k to be 0 here so we can more easily and clearly write N as a roduct of the rimes. 5 That is, χ(mn = χ(mχ(n for all m, n N (without restriction. n= a k k, 5

6 Then L(s, χ converges absolutely for all R(s > and has the following roduct reresentation which converges absolutely for all R(s > : L(s, χ = χ( s where the roduct is over all rimes. Proof. For a rime, consider the series k=0 k=0 ( χ( k = k=0 s k=0 k=0 χ( k ks for a variable s C. Since χ is bounded, there exists a real number M such that χ(n M for all n. Therefore, χ ( k ks M ks = M ( k M R(s = R(s < since R(s <, so this series absolutely converges for all rime. If we enumerate the rimes = 2, 2 = 3, 3 = 5,..., then for all N N, ( ( kj N N χ( j s = χ( j N χ k j j j s =. j j= j= k j =0 Since we can multily convergent series in the natural way (see Aendix A.4.2, this roduct is equal to ( N χ k ( j j χ k k 2 2 k N N χ(n k = k N = j= k = ( js s = j k = k N = k k 2 2 k N n s, N n S N where S N is the set of ositive integers whose greatest rime divisor is less than or equal to N. (That is, S N is the set of n N such that N for all rime which divide n. The justification for this last ste is what makes this roof beautiful. We first note that the multilication of absolutely convergent series yields another absolutely convergent series. So rearranging the terms is justified. To show that the sums are equal, we show that each term in each sum shows u in the other, and that no term shows u twice in either sum. Starting on the left side of the equality, to each (k,..., k N -term corresonds a unique n S N, namely, n = k k N N, such that the (k,..., k N - term on the left equals the n-term on the right. On the other hand, given the n-th term on the right, since n S N, n has a factorization into rimes less than or equal to N. Since this factorization is unique, there exists a unique N-tule (k,..., k N such that the n-term on the right equals the (k,..., k N -term on the left. Since the terms in each series show u exactly once in the other series, the series are equal. Finally, since every n N can be factored into rimes, j= k j =0 k js j χ(n lim n s = n S N n= N χ(n n s = L(s, χ, 6

7 and since χ(n n s M <, nr(s n=0 n=0 the series L(s, χ converges (absolutely for all R(s >, giving us L(s, χ = lim N χ(n n s = lim n S N N N j= χ( j s j = χ( s, as claimed. Finally, to show that the roduct converges absolutely, it will be sufficient (see Aendix A.4. to show that the series ( χ( s converges absolutely. But this is straightforward. We have that χ(n M for all n N. For a technical reason, redefine M if necessary such that χ(n M and R(s M for all rimes. (This is always ossible because the set of all R(s is a discrete subset of R. Then χ( s = χ( s χ( M n R(s M, and this finite because R(s >, so the roduct converges absolutely for R(s >. With this fact, we make a definition and derive an amazing corollary. Definition 2. Define the Riemann zeta function ζ(s for all s C by ζ(s = (Note: we will often omit the bounds on the sum; when we do so, it should be assumed the sum over n is over n N. n= Clearly, if χ(n = for all n N, then L(s, χ = ζ(s (in the notation of the revious theorem. Since the constant function χ(n = is comletely multilicative and bounded, we obtain the following corollary: Corollary 3 (Euler s roduct formula. The Riemann zeta function converges absolutely for all R(s >, and it has the roduct reresentation which also converges absolutely for R(s >. n n s = ζ(s = n s. s, Meditations on the equality of this infinite sum and infinite roduct are good for the soul, I believe, if and only if one understands the equality. Euler sure did, and his soul was well. The equality in Corollary 3 has been called the all-imortant formula and the golden key by Havil 6 and Derbyshire, 7 resectively. 6 Havil, J. The All-Imortant Formula. Princeton, NJ: Princeton University Press, Derbyshire, J. Prime Obsession: Bernhard Riemann and the Greatest Unsolved Problem in Mathematics. New York, NY: Penguin, n= 7

8 Furthermore, this equality is an analytic one, yet it is equivalent to the Fundamental Theorem of Arithmetic. For we used the Fundamental Theorem in our roof of the equality, and, conversely, if the equality holds, then every n N, the n-term in the series must corresond to one and only one term in the infinite roduct, once the multilication is erformed. Using Corollary 3, we can easily rovide our second roof of the infinitude of the rimes. Considering the series reresentation of ζ(s, we see that ζ( is the harmonic series which diverges to infinity. Suose, there existed finitely many rimes. Then by Corollary 3, lim ζ(s = lim s + s + which contradicts ζ(s as s The Prime Number Theorem s = <, With the infinitude of the rimes roven, one might ask how dense the rimes are in N. For examle, the sequences 2, 4, 6, 8,... and 4, 6, 64, 256,... each contain an infinite number of terms, but the first sequence, in a sense, contains many more terms. To make this notion rigorous, we could define ψ (x = #{2m m N, 2m x} and ψ 2 (x = #{4 m m N, 4 m x}. Then ψ (x/x is aroximately 2 while ψ 2(x/x is aroximately log 4 log x x, which goes to 0 as x and is therefore less than 2. In a similar way, we give one (famous answer to the density of rimes question by defining the function π(x = #{ x is rime}. Then, for examle, π(6 = #{2, 3, 5} = 3 and π(7 = #{2, 3, 5, 7} = 4. We will use this definition of π(x throughout the aer. The Prime Number Theorem (or PNT gives us an asymtotic formula for counting the number of rimes less than x. The PNT states that π(x x log x where the logarithm is taken base e (here and throughout the aer and the notation f(x g(x means that f(x lim x g(x =. That is, to count the number of rimes less than, say, 000, we could calculate 000/ log The actual value is π(000 = 68. The ratio of the actual value to the aroximate is about.6. If we do the same for a million, we have π(0 6 = 78, 498 and 0 6 / log(0 6 72, 382. The ratio between these values is.084. As we will rove, this ratio converges to. As an interesting aside, one might notice the resence of the transcendental number e in the formula given by the PNT. Given how uncomfortable the Greeks were when they discovered the irrational numbers, imagine how distraught they would be uon discovering that e shows u in a theorem describing the density in N of the multilicative building blocks of N. This is an elementary question, yet the answer requires, not just irrational numbers, but transcendental numbers! Perhas this is art of the reason why algebraic rodigy Niels Abel described the PNT as erhas the most remarkable theorem in mathematics. 8

9 2.2. History of the Prime Number Theorem A timeline of the history of the PNT is as follows: 793 Gauss begins thinking the function π(x. 798 Gauss and Legendre conjecture something close to the PNT; Legendre secifically conjectures that π(x = where A = and B = x A log x+b 848 Chebyshev roved that if π(x x/ log x N for some ositive integer N, then N =. 852 Chebyshev uses elementary roerties of the factorial and logarithmic functions to show that there existed x 0 such that for all x > x 0, where B 0.92 and 6 5 B.. B < π(x x/ log x < 6 5 B, 892 Sylvester imroved on Chebyshev s bound, increasing the value of B to about 0.956, but he closes his aer with a essimistic comment about the generalizability of Chebyshev and his (elementary methods. 896 Hadamard and de la Vallée Poussin alied Hadamard s theory of integral functions to the Riemann zeta function, as well as a simle trigonometric identity to rove the PNT. 92 Hardy asks whether he should believe that there exists an elementary roof (i.e., one which does not involve comlex analysis to what seems like an elementary fact about the natural numbers. 932 Landau and Wiener rovide simlified roofs of the PNT. 948 Paul Erdös and Atle Selberg find a truly elementary roof, to the surrise of Hardy and many others. (The develoment of the roof and of the controversy over who got the credit is an interesting story, which is summarized in [G]. 980 Donald Newman, following in the footstes of Wiener and Ikehara s simlifications, simlifies contour integral methods used in older roofs, to both avoid estimates at and the use of Fourier transforms. D. Zagier s summarized roof sans only three ages. Many great theorems have multile roofs. As this brief history hoefully shows, the PNT is not lacking in roofs. The roof we give is essentially Newman s, but we follow Zagier s aer, which ulls results from Euler, Riemann, Chebyshev, Hadamard, de la Vallée Poussin, Mertens, and Newman An analytic theorem Before we get to Newman s roof of the PNT, we first state and rove what we, following Newman, will call the Analytic Theorem. It is, for his uroses, a lemma, so we state and rove it here. The result is the most difficult to rove of the sequence of facts Newman roves on his way to roving the PNT. It might seem unmotivated, but rest assured: it is just what we need! But first, for our uroses, we rove the following lemma (the roof of which must have been so obvious to Zagier that he felt no need to mention it in his aer: 9

10 Lemma 4. Fix R > 0. Suose g : C C is holomorhic on R(s 0. Then there exists η > 0 such that g is holomorhic at s for all s C such that s R and R(s η. Proof. Since g is holomorhic for R(s 0, it is (in articular holomorhic on A 0 = {s C : R(s = 0, s R}. By definition, for each s A 0, there exists r s > 0 such that g is holomorhic at each oint in the disk B rs (s = {w C : w s < r s }. Because this disk is oen, it makes sense to say that g is holomorhic on the disk or, more loosely, at each oint the disk. Since A 0 can be written A 0 = {iy : R y R}, A 0 is clearly closed and bounded, which means that it is comact. Note that s A 0 B rs (s A 0, so by the definition of comactness, there exist s,..., s n A 0 such that n B rk (s k A 0, k= where we have set r k = r sk for each k to simlify the notation. If needed, relabel the n oints s,..., s n such that y < y 2 < < y n, where y k = I(s k for each k. Also, if B rk (s k B rl (s l for some k l, then throw out s k from the list and relabel. Next, define w 0 = Ri, w n = Ri, and w j = (s j+ ir j+ + (s j + ir j 2 for all j =,..., n. As is clear from Figure, w 0 B r (s, w n B rn (s n, and w j B rj (s j B rj+ (s j+ for each j =,..., n. Therefore, since each B rk (s k is oen, which also imlies that the intersection of any two is also oen, we can choose ositive numbers η 0,..., η n such that B η0 (w 0 B r (s, B ηn (w n B rn (s n, and B ηj (w j B rj (s j B rj+ (s j+ for all j =,..., n. Set η = 2 min{η 0,..., η n }. Then for each j =,..., n, both w j η B rj (s j and w j η B rj (s j. (The 2 in the definition of η was so that we could claim that the w j η are in the balls. Since balls are convex, this imlies that, for each j =,..., n, every oint on the segment connecting w j and w j is in B rj (s j and is therefore in n k= B r k (s k. Since the union of these segments is the segment connecting w 0 = Ri and w n = Ri, we have that where A η n B rk (s k, k= A η = { η + iy : R y R}. Furthermore, given any s = σ + it C such that s R and η < σ < 0, we have it A 0 and η +iy A η, which imlies that iy, η +iy n k= B r k (s k. In fact, since η +iy is further from each s k than iy, there exists k such that iy, η + iy B rk (s k. Thus by the convexity of B rk (s k, 0

11 Figure : This figure demonstrates how we choose the oints w j ; here, the balls centered at s, s 2, and s 3 cover A 0. η + iy B rk (s k for all η (0, η. Since s = σ + iy was an arbitrary comlex number such that s R and η < σ < 0, we have that A η n B rk (s k k= for all η < η 0 (where A η is defined in the same way as A η. So, in fact, the union of all such A η is contained in n k= B r k (s k, which (remember contained only oints at which g(s is holomorhic. Therefore, g(s is holomorhic on A η = {σ + iy : η σ 0, R y R}. η η 0 In articular, g(s is holomorhic for all s C such that s R and R(s η. Now we move on to the theorem. Theorem 5 (Analytic Theorem. Let f(x be a bounded and locally integrable function 8 and suose that the function g(s = 0 f(xe sx dx, defined for R(s > 0, extends holomorhically to R(s 0. Then 0 f(xdx exists and is equal to g(0. Proof. For all T > 0, define g T (s = T 0 f(xe sx dx. 8 A locally integrable function is one whose integral is finite over any comact subset of the function s domain.

12 We wish to show that lim T g T (0 = g(0, as this would rove the theorem. Our strategy will be to estimate the difference between g T (0 and g(0 in terms of a contour integral in C, and to rove that it goes to 0. To this end, let ɛ > 0. Our first ste is choosing real numbers M, R = R(M, ɛ, η = η(r, M = M (η, R, δ = δ(η, M, and T 0 = T 0 (δ, ɛ, M, R. We will then let T > T 0, and show that this imlies g T (0 g(0 < ɛ by means of an estimate of a contour integral. To begin, we know that f(x is bounded. Choose M > 0 such that f(x M for all x 0. Next, choose R > 3M/ɛ. Given R, use Lemma 4 and choose η > 0 such that g(s is holomorhic at all s C such that s R and η σ. Since g(s is holomorhic on the closed and bounded that is, comact subset containing all s C such that s R and σ > η, g(s is bounded on this set. Choose M > 0 such that g(s M for all s in this region. Next, choose δ > 0 such that δ < η and sin ( δ R < πɛ 9M. (Note that such a δ exists, because sin (δ/r 0 continuously as δ 0. Finally, choose T 0 > 0 such that e δt 0 < πδɛ 8RM. (Again, δ > 0, so e δt 0 0 continuously as T 0, so this is ossible. Let T > T 0. We claim that g T (0 g(0 < ɛ. To comute this estimate, we rewrite this difference as the value of a contour integral. Define the contour C in C as the boundary of the region {s C : s R, R(s δ}, ositively oriented. Around this contour, we will integrate ( h(s = (g T (s g(se st + s2 R 2 s. Note the following g(s is holomorhic on C (by our choice of δ, which is less that η, so g(s is bounded on C. By the Leibniz Integral Rule (see Aendix A.2, since 0 and T do not deend on s, d T ds g T (s = 0 ( f(xe sx T dx = f(x( xe sx dx. s 0 Since f(x is locally integrable, the integrand is locally integrable. Therefore, this integral, which is the derivative of g T (s, exists for all s C. So g T (s is entire and, as a result, bounded on any bounded subset of C. e st ( + s 2 /R 2 is entire and, as a result, bounded on any bounded subset of C. s has a simle ole at s = 0. h(s is holomorhic excet for a simle ole at 0, since it is the roduct of s and functions which are holomorhic inside C. 2

13 From these considerations, since 0 is in the region of which C is a boundary, Cauchy s Residue Theorem imlies that h(sds = 2πiRes(h, 0 = 2πi lim sh(s = 2πi(g T (0 g(0, s 0 C where the last equality follows because g(s and g T (s are holomorhic and therefore continuous at 0. From this, we reevaluate our goal: We must show that 2π h(s = g T (0 g(0 < ɛ. C We slit the estimate into three ieces. Define C ± = C {s C : ±R(s > 0}. Then h(sds h(sds C C + + g T (se ( st + s2 ds C R 2 s + g(se ( st + s2 ds C R 2 s ( For the first integral, we note first that, since R(s > 0 for s C +, g(s g T (s = f(xe sx dx f(x e R(sx dx M T T e R(sx dx = T M R(s e R(sT for all s C +. Also, since s = R for s = σ + it C +, ( est + s2 R 2 s = e R(sT σ 2 + t 2 + (σ + it 2 R 3 = e R(sT 2σ σ + it R 3 = 2e R(sT R(s R 2. Therefore, if we denote the arc length of C + by l(c +, h(sds l(c + max h(s C + s C + ( 2πR 2 = 2πM R < 2π ɛ 3 ( M R(s e R(sT ( e R(sT 2R(s R 2 (since R > 3M/ɛ. So far, so good. Now we estimate the second iece. Since g T (s is entire, we can continuously deform 9 C into C = {s C : s = R, R(s < 0} and relace the integral over 9 See Aendix A... 3

14 C with the integral over C. Now for all s C, and, as in the first integral, So C g T (s = T 0 M R(s f(x e R(sx dx M ( est + s2 R 2 s ( g T (se st + s2 ds R 2 s ( e R(sT T 0 = 2eR(sT R(s R 2. e R(sx dx M R(s e R(sT ( l(c max g s C T (se st + s2 R 2 s 2πR ( ( M 2 R(s e R(sT R(sT R(s 2e R 2 = 2πM R < 2π ɛ 3. Finally, we estimate the third integral in three arts. Let C = {s C : R(s = δ}, These three contours clearly artition C, so g(se st C C 2 = {s C : R(s > δ, I(s > 0}, C 3 = {s C : R(s > δ, I(s < 0}. ( + s2 R 2 ds s 3 g(se st C j j= For the integral over C, we have l(c 2R, g(s M, (by our choice of T 0, and So g(se st C e st = e δt e δt 0 < πδɛ 8RM ( + s2 R 2 s ( + δ = 2 δ. ( + s2 R 2 ds s ( + s2 R 2 ds s. ( l(c max s C g(sest + s2 R 2 s (2RM e δt 2 δ < 4RM ( πδɛ δ 8M = 2π ɛ 9. 4

15 For the integral over C 2, we have l(c 2 = R sin ( δ R < πɛr 9M (by our choice of δ, g(s M, e st = e R(sT, and ( + s2 R 2 s ( + R = 2 R. So g(se st C 2 ( + s2 R 2 ds s ( l(c 2 max s C 2 g(sest + s2 R 2 s ( ( ( δ 2 R sin M R R < ( πɛr 9M = 2π ɛ 9. 2M A similar no, an identical calculation yields the same bound for the integral over C 3. Therefore, the third integral in Equation is bounded by R 2π ɛ 9 + 2π ɛ 9 + 2π ɛ 9 = 2π ɛ 3, which means the the integral over C is bounded by So 2π ɛ 3 + 2π ɛ 3 + 2π ɛ 3 = 2πɛ. 2π g T (0 g(0 = which, after dividing both sides by 2π, gives us our result. C h(sds < 2πɛ, Newman s roof of the Prime Number Theorem The roof follows Zagier s sequence of stes, u to some reordering of the lemmas (including the analytic theorem. Zagier was kind enough to leave lenty of details for the reader to work out. For examle, his roofs of Lemmas 0, 2, and 3 are two sentences long, and his roof of Lemma is a mere three. I have greatly exanded on his roofs with the intention of making it understandable to the undergraduate who has had some analysis. My hoe is that one could read this and understand it in one (ossibly long sitting, as oosed to the multile (infinitely long sittings which I have required. We begin with a definition: Definition 6. For all x R and s C, define Φ(s = log s and ϑ(x = x log, where the sums over run over the rime natural numbers. 5

16 With these definitions in hand, we rove the Prime Number Theorem (following [Z]. The roof follows easily from a sequence of stes, which we rove here as lemmas. The first two lemmas build u to the third: Lemma 7. For R(s >, ζ(s and Φ(s are absolutely convergent. Proof. Corollary 3 roves that ζ(s is absolutely convergent, so we must show that Φ(s is too. Suose σ = R(s >. Then consider that log s n=2 log n n σ log 2 2 σ + 2 log n n σ dn. A comutation shows that the value of the integral is (σ 2 when σ >, from which we conclude that Φ(s absolutely converges for all σ >. Lemma 8. For all δ >, ζ(s and Φ(s are uniformly convergent on σ = R(s δ. (When this is the case, we say these series are locally uniformly convergent. Proof. We use the Weierstrass M-test. Fix δ >. For all n N, set M n = n δ and M n = n δ log n. Then because δ >, both n= M n and n= M n converge. Also, n s = n σ M n and n s log n = n σ log n M n for all σ >. So by the Weierstrass M-test, ζ(s and Φ(s converge uniformly for σ δ. From these two lemmas we can rove the following lemma: Lemma 9. For R(s >, ζ(s and Φ(s are holomorhic. Proof. Let σ = R(s >. By Lemma 8, ζ(s and Φ(s converge uniformly at s. Then if we can show that the termwise derivatives of these series also converge uniformly at s, then we could conclude that ζ (s and Φ (s exist and are equal to their resective termwise derivatives (see Aendix A..2. To do this, we comute the series of termwise derivatives, obtaining d ds n s = log n n s and n= n= d log dx s = log2 s These are clearly uniformly convergent by the Weierstrass M-test; the argument is similar to that which roved Lemma 8. In the next two lemmas, we modify ζ(s and Φ(s slightly so as to enlarge the region over which it is holomorhic. More secifically, we subtract (s from both functions and rove that the modified functions are holomorhic over some larger region. Lemma 0. For σ = R(s > 0, ζ(s /(s is holomorhic. (That is, ζ(s s extends holomorhically to R(s > 0. Proof. For σ >, we have s = dx x s = n+ dx n= n x s. 6

17 Furthermore, this series absolutely converges for σ > since n+ dx x s n+ dx x σ = ( σ n + = n σ <. n= n n= n n= Therefore, since the series for ζ(s also converges absolutely for σ > by Corollary 3, we can subtract the series termwise, obtaining ζ(s s = n= n+ n dx n s n= n+ n dx x s = n= n+ n ( n s x s dx, (2 which is an absolutely convergent, holomorhic function for σ > (because ζ(s and (s are for σ >. Now we show that this series converges absolutely for all σ > 0. We evaluate the following in two different ways. On one hand, On the other hand, n+ s so n x n n+ s du u n s+ dx x n du u s+ dx n+ = ((n + n(x n max n= n+ n n ( n s x s n x n+ n u x ( n s x s dx s n σ+, n= dx. s s (( u s+ n σ+, which converges because σ + >. Hence the series in Equation 2 converges absolutely for σ > 0. Now consider the termwise derivative of the series in Equation 2: ( d ζ(s = ds s n= d ds n+ n ( n s x s dx = n= n+ n ( log n n s + log x x s dx. (Note the use of the Leibniz integral rule again! If this series converges uniformly for σ > 0, we will conclude that ζ(s (s is holomorhic for σ > 0. To rove this, we use the same trick as before: n+ ( log n n s + log x n+ x ( x s dx = s log u u s+ dudx n which imlies n= n+ n n n ((n + n(x n max + σ log(n + n σ+, ( log n n s + log x x s dx n= n x n+ n u x + σ log(n + n σ+. s log u u s+ Since σ + >, the series for d ds (ζ(s (s converges absolutely and therefore exists. 7

18 Lemma. For σ = R(s, ζ(s 0 and Φ(s /(s is holomorhic. Proof. First, for σ >, the roduct reresenting ζ(s converges absolutely by Corollary 3, hence it converges to a nonzero finite number (see Aendix A.4.. Later in the roof, we show that ζ(s 0 for σ = on our way to showing the second half of the result. By Lemma 9, Φ(s is holomorhic for R(s >. So since the same is true for (s, the result clearly follows for R(s >. It is left to show that Φ(s (s is holomorhic for R(s =. As a first ste, note that Φ(s and the series s ( s log converge absolutely for σ >. Hence Φ(s + log s ( s = ( log s + s ( s = log s. (3 Now consider that ζ (s ζ(s = d ζ(s dz s = ( d ζ(s dz s q s q = ( s log ( s 2 ( s = log s. (4 Equating Equations 3 and 4, solving for Φ(s, and subtracting (s, we obtain Φ(s ( ζ s = (s ζ(s + s Considering that log s ( s + log ( σ 2 log 4 2σ, log s ( s. (5 we could rove that the sum over on the right-hand side of Equation 5 is uniformly convergent for σ > 2 by using the Weierstrass M-test with M = log 4 2σ. We would then rove the same about the sequence of derivatives of the artial sums and conclude that it is holomorhic for σ > 2. Also, as we now show, the other term on the right-hand side of Equation 5 is holomorhic on σ > 0, excet where it is undefined. This would rove that the left-hand side of Equation 5, Φ(s (s, is holomorhic (excet where it is undefined for σ > 0. By Lemma 0, the function ζ(s (s ζ(s = s s is holomorhic for σ(s > 0. Hence the numerator of this function is holomorhic for σ > 0, which imlies that (s ζ(s is holomorhic for σ > 0. Since comlex-differentiability imlies that derivatives of all orders exist (see Aendix A..2, we conclude that d ds ((s ζ(s is also holomorhic. Hence ζ (s ζ(s d s = ds ((s ζ(s (s ζ(s 8

19 is holomorhic for σ > 0 excet where it is undefined, which could only be where (s ζ(s = 0. We first check that (s ζ(s 0 at s =. Since ζ(s (s is holomorhic on σ > 0 by Lemma 0, it is holomorhic at s =. In articular, the (finite value at exists and the limit of ζ(s (s is that value. Then we have lim s ( ζ(s s = lim s ( (s ζ(s s. (6 Because s converges to 0 as s, it must be the case that (s ζ(s converges to 0 as well. So (s ζ(s is not 0 at s =. Now we check that (s ζ(s 0 for σ and t = I(s 0. That is, we want to show that ζ( + it 0 for all real t 0. (This will also comlete the roof of the first art of this theorem. Suose on the contrary that ζ(s has a zero of order µ at s = + it. Because ζ(s s is holomorhic at s = + it and since t 0, ζ(s is holomorhic at s = + it. Therefore, by Lemma 0, µ 0. Write ζ(s = (s it µ g(s. Showing that µ = 0 would show that ζ(s has no zeros on the line σ =, so we roceed to this now. We need to get our hands on µ. To do this, note that ζ (s = µ(s it µ g(s+(s it µ g (s. This imlies, for all ɛ > 0, ζ ( ± it + ɛ ζ( ± it + ɛ = µ ( ± it + ɛ it + g ( ± it + ɛ g( ± it + ɛ. Multilying both sides by ɛ and taking ɛ 0 yields ( ± it + ɛ lim ɛ 0 ɛζ + ζ( ± it + ɛ = µ + lim ( ± it + ɛ ɛ 0 ɛg + g( ± it + ɛ. (7 But ζ(s, being once differentiable for σ >, is infinitely differentiable for σ >, so g (s is differentiable and therefore bounded for σ >. Also g( + it 0 because otherwise + it would have order strictly greater than µ. Finally, since (s it µ g(s = ζ(s = n= n s = ζ(s = (s itµ g(s, it follows that g( + it = g( it 0 = 0. Thus the limit involving g(s and g (s in Equation 7 is 0, and the right-hand side is equal to µ. Great! Now to interret the left-hand side, we use Equation 5. Set s = ± it in this equation, multily both sides by a ositive ɛ and take ɛ 0 +. Then the ɛ(s terms vanish, as does the term involving the sum over because it is bounded. This leaves us with ( ± it + ɛ lim ɛφ( ± it + ɛ = lim ɛ 0 + ɛ 0 ɛζ = µ. (8 + ζ( ± it + ɛ Now it will be convenient to consider the oint + 2it as a zero of order ν. As we argued for µ, we have that ν 0. (The actual value of ν has no bearing on our roof; it is merely a convenience to introduce it. Now by analogous reasoning, we could rove an analogue of Equation 8 for ν: lim ɛφ( ± 2it + ɛ = ν. (9 ɛ 0 + Finally, if we substitute + ɛ into Equation 5, multily by ɛ > 0, and take ɛ 0 +, we obtain lim ɛφ( + ɛ =. (0 ɛ 0 + 9

20 To comlete the roof which, recall, required us to show that µ = 0 we come u with an equality in terms of µ and ν, from which we will conclude that µ = 0. To do this, note that 0 log +ɛ (2R(it/2 4 = log +ɛ (it/2 + it/2 4 = log +ɛ+2it + 4 log +ɛ+it + 6 log +ɛ +4 log +ɛ it + log +ɛ 2it = Φ( + 2it + ɛ + 4Φ( + it + ɛ + 6Φ( + ɛ +4Φ( it + ɛ + Φ( 2it + ɛ. ( Now multily both sides of Equation by ɛ > 0, take ɛ 0 +, and use Equations 8, 9, and 0 to arrive at 0 ν 4µ + 6 4µ ν = 6 8µ 2ν. Since ν 0, this inequality imlies 8µ 6, from which we (finally! conclude that µ = 0. The next ste uses some new notation, which we introduce now before continuing. Given functions defined on R, we say that f(x = O(g(x if there exist C R and x 0 R such that f(x C g(x for all x > x 0. For examle, the function f(x = log x+log 2 x+x is O(x because, in the limit as x, the x term dominates, from which we could rove that f(x = O(x using the definition. Now we go back to the roof of the Prime Number Theorem. Lemma 2. ϑ(x = O(x. Proof. First let n N. Then consider that e ϑ(2n ϑ(n = ex n< 2n log = n< 2n where k ranges over the integers (in the secified range and over the rimes. Now let a = n<k 2n k, b = k n k, and c = n< 2n Then ex(ϑ(2n ϑ(n = c, and a b = ( ( 2n n, where 2n n is the binomial coefficient. Now note that b a since ( 2n n is always an integer a cool fact in itself! Also note that c a since c is a roduct over a subset of the set of terms over which a is a roduct. Furthermore, note that if is a rime dividing c, then > n, which imlies k for all terms k in the roduct a. Hence b and c are relatively rime, from which we conclude that bc divides a. In articular, bc a, so e ϑ(2n ϑ(n = c a ( 2n b = n 2n k=0. ( 2n = ( + 2n = 2 2n. k 20

21 Taking logarithms, we have ϑ(2n ϑ(n 2n log 2. Next, we use this last equality to rove that ϑ(x ϑ( x 2 2x log 2 for all x (0,. If x (0, 2], the inequality holds trivially because ϑ(x = ϑ( x 2 = 0 unless x = 2, in which case the inequality reduces to log 2 4 log 2. So let x (2,, and choose m N such that 2m < x 2m +. Then ϑ(x = log log ϑ(2m + log(2m + x 2m+ (where the last inequality is strict if and only if 2m + is rime. Also, ( x ϑ = log log = ϑ(m. 2 x m 2 Hence ( x ϑ(x ϑ 2 ϑ(2m ϑ(m + log(2m + 2m log 2 + log(2m + x log 2 + log(x + 2x log 2, where the last inequality follows since log(x + x log 2 for all x > x 0 = 2. Finally, we use this estimate to rove the lemma! Choose C = 2 log 2 and x 0 = 2, and let x > x 0. Next, choose r N such that 2 r x < 2 r+. Then ϑ(x/2 r = 0, and hence which roves the lemma. Lemma 3. The imroer integral r ( ( x ( x r ( x ϑ(x = ϑ 2 k ϑ 2 k log 2 2 k k= k= ( 2 r = x log 2 2 2x log 2 = Cx, ϑ(x x x 2 dx converges. Proof. Let I be the value of the integral, be it finite or infinite. We are going to aly the analytic theorem to rove that this integral convergence, so, in order to do this, we must check that the hyotheses of Theorem 5. We change variables x = e t in the integral to get ( ϑ(x dx I = x x = ( ( ϑ e t e t dt = f(tdt, where we have defined f(t = ϑ(e t e t. Note that, by Lemma 2, for some constant C, 0 f(t ϑ(e t e t + (Ce t e t + = C + for all t ast some t 0. Since f(t has only jum discontinuities at the rime t, it is clearly bounded on (0, t 0 ] and hence bounded on (0,. Furthermore, for any comact K (0,, f(t has only finitely many discontinuities, so the integral of f(t over K is finite. Hence f is locally integrable. Now define g(z = 0 f(te zt dt for all R(z 0. (As we will show, g(z is holomorhic on R(z 0, so the convergence of the integral is not a roblem. As we will also show, 0 ϑ(e t e (z+t dt = 2 0 Φ(z + z +, (2

22 which imlies or, equivalently, g(z = 0 ϑ(e t e (z+t dt 0 e zt dt = g(z = (Φ(z + z z +. Φ(z + z + If this holds for R(z > 0, then R(z + >, then Lemma would then imly that Φ(z + z is holomorhic for R(z 0. The assumtions of the analytic theorem would therefore be satisfied, from which we could conclude that I, the integral in question which equals 0 f(tdt, exists and is equal to g(0 (a bounded quantity since g(z is holomorhic at z = 0, thus roving the theorem. All we need to do is rove is Equation 2. To do this, we let J be the value of the integral in Equation 2, set s = z +, and make the variable substitution x = e t to get sj = sϑ(x x s+ dx = k=0 k+ k sϑ(x dx, xs+ where, 2,... is an enumeration of the rimes and 0 = for convenience. The reason for slitting u the integral in this way is because, for x ( k, k+, ϑ(x = x log = k log = ϑ( k. z This allows us to ull the ϑ(x out of the integral in order to evaluate it: k+ s ( sj = ϑ( k k x s+ dx = ϑ( k s k s. k+ k=0 This almost looks like a telescoing series how we would love to see one of those right now! In fact, if we note that ϑ( k = ϑ( k+ log k+ and recall that the series ϑ( s and Φ(s converge absolutely, we arrive at exactly this: sj = k=0 ( ϑ( k s ϑ( k+ k s + log k+ k+ s k+ k=0 = k=0 ( ϑ( k s ϑ( k+ k s + Φ(s. k+ Since the value of the telescoing series is ϑ(/ = 0 for all R(s >, we have J = Φ(z + /(z + for R(z > 0, from which Equation 2 follows immediately. Lemma 4. ϑ(x x. Proof. We first show the following: For all ɛ > 0, there exists x 0 > 0 such that x > x 0 imlies ϑ(x/x < ɛ. A arallel roof (which we omit will show the corresonding inequality ϑ(x/x < ɛ, which will rove (straight from the definition of convergence that ϑ(x/x as x. We roceed by contradiction. Suose there exists an ɛ > 0 such that for all x 0 > 0, there exists x > x 0 for which ϑ(x/x ɛ or, equivalently, ϑ(x λx where λ = + ɛ >. Fix such an ɛ and let x 0 =. Choose x > x 0 such that ϑ(x λx. Then, since ϑ(x is an increasing function, λx x ϑ(t t λx t 2 dt x ϑ(x t λx t 2 dt x λx t t 2 dt. This integral s value is C(λ = λ log λ, which is a strictly ositive quantity since λ >. 22

23 Now we reeat this rocess. Set x = x, x 0 = λx, and find an x > x 0 such that ϑ(x λx. Set x 2 = x and continue inductively to define a sequence x, x 2,... where each x n+ > λx n and λxn+ x n+ ϑ(t t t 2 dt C(λ for each n. Then ϑ(t t t 2 dt n= λxn x n ϑ(t t t 2 dt C(λ, which clearly diverges since C(λ > 0. This contradicts Lemma 3, which concludes the roof. With all of the hard work done, we can rove the Prime Number Theorem: Theorem 5 (The Prime Number Theorem. Let π(x denote the number of rimes less than or equal to x. Then π(x x/ log x. Proof. Let ɛ > 0. Without loss of generality, assume that ɛ <. To hel with the estimates, we Choose η (0, ɛ such that η < n= ɛ +η 4+ɛ. (This inequality imlies η < + ɛ 2. Choose x > 0 such that x > x imlies ( ηx < ϑ(x < ( + ηx. (This is ossible by Lemma 4. Choose x 2 > 0 such that x η log x < + ɛ 3 for all x > x 2. (This is ossible because x η log x 0 as x, as can be seen by alying l Hôital s rule. Set x 0 = max{x, x 2 }, and let x > x 0. Then from ( ηx < ϑ(x = x log x log x = π(x log x, we deduce π(x log x/x > η > ɛ. Now we obtain the other estimate. log > ( η log x. Hence ( + ηx > ϑ(x x η < x Dividing both sides by ( ηx yields from which we conclude Since log x is an increasing function, > x η imlies log x η < x log x = ( η log x ( π(x π(x η ( η log x ( π(x x η. π(x log x x + η η > π(x log x x log x x η, < + η η + log x x η < ɛ 2 + ɛ 2 = ɛ. Thus π(x log x/x is within ɛ of for all x > x 0, as required! Sit. Meditate. Enjoy the beauty which lies before us. 0 0 What comes next is sure to shock those who do not know and leasantly remind those who do. Either way, the task before us is a great one, so if a break is desired, now is the time. 23

24 3 Primes in arithmetic rogressions We move on now to a toic which motivated the birth of analytic number theory: rimes in arithmetic rogressions. So far, we have roven that there exist infinitely many rimes, and we have given a formula for the density of these rimes in the set of natural numbers. Ignoring the rime 2, we could have equivalently considered these questions about the odd numbers that is, the arithmetic rogression { + 2n} n=0 and come to the same conclusions. In this section, we generalize to arbitrary arithmetic rogressions {a + qn} n=0 where a, q N, and we ask the same two questions: Are there infinitely many rimes, and what is their density in N? 3. Dirichlet s Theorem Dirichlet s theorem, or, more roerly, Dirichlet s theorem on rimes in arithmetic rogressions, states that there exist infinitely many rimes in any arithmetic rogression a, a + q, a + 2q, a + 3q,... so long as gcd(a, q =. (If a and q have a common factor, then a + qn for all n N is comosite. We will rove this result in a sequence of stes. Our roof will follow Davenort s, which rimarily follows Dirichlet s original. We give Euler s roof for q = 2. We already have seen the result for q = 2, both from Euclid s argument and by considering the Euler roduct formula for ζ(s, but we give a third roof which, for one thing, roves a stronger fact, and, for another thing, ultimately insired Dirichlet who generalized the argument to comose a roof of his theorem. (Note: This is the way Dirichlet did it, and it is how Davenort resents it. 3.. Euler s roof for q = 2 In this section, we rovide a third roof of the fact that there exist infinitely many rimes. Recall that ζ(s = n n s = ( s for all s (,. Taking logarithms and exanding the logarithms into ower series (which will be allowed since s <, imlies that s is within the radius of convergence of, we obtain log ζ(s = log ( s = log( s = ( m m ( s m = m= m sm m= = s + δ(s (3 where δ(s = = m sm < sm = m=2 m=2 s ( s < 2 < ζ(2. ( 2s s Technically, we are about to turn the logarithm of an infinite roduct into an infinite sum, which may cause roblems that do not arise with finite sums and roducts. We excuse ourselves in Aendix A.4. by roving that this works as it does in the finite case. 24

25 (Note that we have sneakily rearranged the order of summation, but this is okay since the series converges absolutely, as can be seen since m sm < sm and sm <. Since log ζ(s as s +, and since δ(s is bounded in the same limit, we conclude from Equation 3 that diverges. This fact clearly imlies the infinitude of the rimes, and it is interesting in its own right the rimes are not too sarsely scattered in N. In order to rove Dirichlet s theorem, we will show the corresonding (generalized fact: qa diverges, where the notation q a means that is congruent to a modulo q, and the sum over q a is over all rimes congruent to a modulo q. This is quite a task, however, so we build our bridge to the theorem in stes. First, we let q > 2 be rime, define Dirichlet characters and Dirichlet L-functions for q, and rove the theorem for q. We then let q be comosite, define general Dirichlet characters and Dirichlet L-functions, and rove the theorem in general. Each ste for the second case is a generalization of the corresonding ste in the first case. Let s begin! 3..2 Dirichlet characters and L-functions (rime modulus In this section, we construct Dirichlet characters of a rime modulus and rove a key roerty about them. Dirichlet characters for the modulus q are arithmetic functions χ : N C, which are q- eriodic (that is, eriodic with eriod q and comletely multilicative (that is, χ(mn = χ(mχ(n for all m, n N. We also require that there exists a linear combination whose value is if n q a and 0 otherwise. (Note: Dirichlet characters satisfy the hyotheses of Theorem. To construct the Dirichlet characters for the rime modulus q, consider the multilicative grou Z q = Z q \{0}. As is well known, this grou is cyclic (see Aendix A.3., so there exists an element g Z q such that g generates Z q. For all n Z q, define ν(n to be the index of n relative to g; that is, define ν(n such that g ν(n q n. Then, for each of the q rimitive (q -th roots of unity ω, define the Dirichlet character on Z q corresonding to ω by ω ν(n. Finally, eriodically extend the definition of ω ν(n to define the character for all n N which are not divisible by q, and for those n for which q n, define ω ν(n = 0. The character ω ν(n is clearly well-defined, for n = m imlies n q m, which imlies g ν(n n m g ν(m modulo q, which imlies ν(n = ν(m + k(q for some integer k. Therefore, since ω q =, ω ν(n = ω ν(m. Also, the function is eriodic by construction. To show comlete multilicativity, note that k = mn, for m 0 n, imlies k mn modulo q, which imlies g ν(k k mn g ν(m g ν(n = g ν(m+ν(n. Since g q modulo q, ν(k = ν(m + ν(n + l(q for some l Z, which imlies ω ν(k = ω ν(m ω ν(n ( ω q l = ω ν(m ω ν(n. Finally, if m or n is 0, then so is k, which imlies that ω ν(k = 0 = ω ν(m ω ν(n. 25

26 Hence ω ν(n is comletely multilicative. Finally, we must show that some linear combination of the ω ν(n gives us or 0, according to whether n q a or n q a, resectively. But first, how many ω ν(n are there? When we defined the functions, we secified that ω q =, so each ω is one of q rimitive (q -th roots of unity. On the other hand, if ω ν(n = ω ν(n 2, then, in articular, they are equal for n = g, for which ν(n =, which imlies ω = ω 2. So there are q functions, each of them distinct. Before we construct such a linear combination, we give an examle of a set of characters. Examle 6. Let q = 5. We choose 2 as our generator. (This choice is arbitrary, but this will not resent a roblem to us later on. We tyically choose some generator at the beginning and it is assumed that we kee the same generator. Then ν( = 0, ν(2 =, ν(3 = 3, and ν(4 = 2. The (q -th roots of unity are, i,, and i, and the corresonding characters are, i ν(n, ( ν(n, and ( i ν(n. A table of these values is given here: i ν(n i i ( ν(n ( i ν(n i i Moving on, we rove the key roerty about Dirichlet characters. Theorem 7. For a rime q > 2, q ω ν(a ω ν(n = ω {, n q a; 0, otherwise (4 where the sum is over all (q -th roots of unity. Proof. First, note that if q k, for some k Z, then ω k = for all (q -th roots of unity ω, and hence ω ωk = q. On the other hand, if q k, then ω k, and owers of ω k generate all of the -th roots of unity, where g = gcd(k, q. From this, we have that ( q g {ω kn } l n=0 = {ωkn } 2l n=l = = {ω kn } gl n=(g l. So ω k = ω g l ω kn = j= n=0 g j= ( ω kl ω k = 0 since ω kl = ( ω k l =. Therefore, if q does not divide k, then ω ωk = 0. We have roven ω k = ω { q, q divides k; 0, q does not divide k. If we divide both sides by q, we set k = ν(a + ν(n, and note that q k if and only if k q 0, which holds if and only if ν(a q ν(n, which in turn holds if and only if n q a, we see that our result follows from the above equality. 26

27 Hence the q arithmetic functions ω ν(n form a set of Dirichlet characters for the rime modulus q. With these characters in hand, Dirichlet defined what are now called Dirichlet L- functions. Following Dirichlet, we now roceed with the definition of these functions. Definition 8. For each Dirichlet character ω ν(n, define the Dirichlet L-function by for all s C. L ω (s = ω ν(n n s n= By Theorem, L ω (s absolutely converges for R(s > and has the following roduct reresentation L ω (s = ω ν( s, which also converges absolutely for R(s >. Now recall that ω ν(n = 0 for all n which are divisible by q, so, in fact, L ω (s = n q0 ω ν(n n s. (5 Also, since ω ν(q = 0, the = q term in the roduct reresentation is, so we have Now note that since L ω (s = q ω ν(. (6 s ω ν( s = R(s < < /2, no term in the roduct is 0. So since L ω (s converges absolutely by Theorem, we have that L ω (s 0 for all s C with R(s >. (This is a general fact about infinite roducts. See Aendix A.4. for an exlanation of why this is true. Similar to our calculation of log ζ(s in Section 3.., since L ω (s 0, we have log L ω (s = q ( m ω ν( s m = m= q m= m ω ν(m ms by the multilicativity of ω ν(n. Now we multily both sides by ω ν(a, sum over all (q -th roots of unity ω, rearrange infinite sums at will (because they are all absolutely convergent!, and obtain ω ν(a log L ω (s = ω ω Because of Equation 4, this imlies that ω ν(a q m ω ν(m ms m= = m ms q m= ω ω ν(a log L ω (s = (q ω q 27 m= m qa ω ν(a ω ν(m. m ms. (7

28 On the right-hand side of Equation 7, we have (q s + (q qa q m=2 m qa m ms. Since the double sum over and m has all ositive terms, it is bounded below by 0 and above by δ(s < ζ(2, as we saw in Section 3... Therefore, we have roven the following lemma to Dirichlet s theorem: Lemma 9. If ω ν(a log L ω (s diverges to infinity as s +, then the series s diverges to infinity. ω The conclusion of this theorem, if true, would rove Dirichlet s theorem. Since we want to rove the conclusion, let us consider this lemma s hyothesis. One of the (q roots of unity is itself, so the sum on the left-hand side of Equation 7 is ω ν(a log L ω (s = log L (s + ω ν(a log L ω (s. (8 ω ω If we consider the ω = term, we see from Equation 6 and Corollary 3 that L (s = q s = ( q s ζ(s, (9 so since ζ(s as s + yet s remains bounded, L ω (s as s +. This is a good start! If we can show that each of the remaining terms is bounded as s +, we will aly Lemma 9 to conclude Dirichlet s theorem. Before embarking on this venture, however, we can simlify our task a bit. First, note that the ω ν(n have magnitude, so they remain bounded as s +. This means that what we have to show is that log L ω (s is bounded as s +. Since the logarithm function is continuous (for R(s > 0 we have chosen the rincile branch of the logarithm without mentioning it, we have that lim log L ω(s = log lim L ω(s. s + s + Hence our task reduces to roving that L ω (s remains finite but nonzero as s +. But we can simlify our task once more. We can show that L ω (s is continuous and therefore finite at s =, so lim s + L ω (s = L ω (. Once we do so, we will have the following recursor to Dirichlet s theorem: Lemma 20. If L ω (s 0 for all ω, then Dirichlet s theorem holds. Proof. We first rove that L ω (s is continuous using the Dirichlet test for convergence and the series reresentation of L ω (s given in Equation 5. To do this, first note that n s decreases as n increases and converges to 0. Second, note that for all N N, N n= ων(n is bounded above by q 2. The reason for this is that as n runs through q consecutive integers, ν(n runs through each ossible index 0,,..., q 2, so q ω ν(n = ω ν = 0 n= 28 q 2 ν=0

29 since each (q -th root of unity shows u exactly once in the sum. Similarly, N n= ων(n is eriodically 0, and is, in fact, eriodic with eriod q. Since eriodic sequences are bounded, the second claim above follows. Therefore, the Dirichlet test for convergence imlies that L ω (s converges for all R(s > 0. Furthermore, 2 this convergence is uniform for all R(s δ > 0. Since the artial sums of L ω (s are continuous, their uniform limit is continuous, so L ω (s is continuous on R(s δ. Suosing we chose δ <, which we assume anyway because our choice of δ > 0 was arbitrary, we have that L ω (s is continuous at s =. Given now that L ω (s is continuous at for all ω, we have that L ω (s L ω ( as s + for all ω. Hence, if our assumtion holds, then L ω ( 0, which imlies that lim s + L ω (s 0, which in turn imlies that log L ω (s remains bounded as s +, for all ω. Hence, as we mentioned, since each ω ν(a is bounded, the ω terms on the left-hand side of Equation 7 remains bounded as s +. Hence the left-side diverges to infinity since the ω = term does, which means that Lemma 9 alies and Dirichlet s theorem holds. It is hard to say where Dirichlet s roof begins. He created the huge body of work we have been discussing in this section. To say that the roof has already started would be fair. However, I feel that what we have shown so far are merely consequences of the definitions of Dirichlet characters and the L-functions. These are of interest to eole indeendent of the fact that they are used to rove Dirichlet s Theorem. I think it is best to think about Dirichlet s roof as the roof of the fact that L ω ( 0 for all ω. Besides, the roof is already long; if we included everything so far in the roof it would aear more unwieldy than is necessary. Enough of this discussion let us roceed with the roof! 3..3 Dirichlet s roof (rime modulus In this section, we show that L ω ( 0 for all ω. We consider two cases: ω is comlex and ω =. (Note: if ω is not comlex, then it must be since we are assuming ω. Suose first that ω is comlex. Since Equation 7 holds for any value of a, it, in articular, holds for a =. Substituting a = into Equation 7 gives us log L ω (s = ω q m ms 0, where the second inequality follows from m ms > 0. (The reason why the inequality can not be made strict which would make this case a triviality is that we do not know for sure that the sum is not emty! Exonentiating both sides yields m= ω e log Lω(s = ω L ω (s. (20 Suose that for some comlex ω, L ω ( = 0. Then L ω ( = ω ν(n n = ω ν(n n = ω ν(n n = L ω( = 0. n= n= 2 We claim this without roof, not because it is difficult, but because it seems reetitive. We have already seen how this sort of roof would go for the functions ζ(s and Φ(s in the roof of Lemma 8. n= 29

30 Now by Equation 9, the fact that (s ζ(s as s + imlies that (s L (s = ( q s (s ζ(s q as s +. Thus L (s has a simle ole at s =. Consider the limit of the roduct of the ω-term, the ω-term, and the -term in the roduct: lim L L ω (sl ω (s ω(sl ω (sl (s = lim (L (s(s. s + s + s The limit of L (s(s is bounded as we mentioned. The limit of the other art, using l Hôital s rule, is equal to ( lim L ω (sl ω (s + L ω (sl ω(s. s + The derivative of the L-functions is comuted term-wise and shown to be uniformly convergent by Dirichlet s test (as we did above for L ω (s, so we omit the rove here. Therefore, L ω (s and L ω(s, as uniform limits of continuous functions, are continuous at s =. This continuity at s = imlies that L ω ( and L ω( are finite, so since L ω ( = L ω ( = 0, the limit is equal to 0. Since the other terms in the roduct in Equation 20 are bounded (because each L ω (s is bounded!, the roduct is 0, contradicting the inequality in Equation 20. So L ω ( 0 for each comlex ω. For large q, we have just covered the vast majority of the cases for Dirichlet s roof. Unfortunately, the amount of work required to rove these cases is not roortional to the amount of work required to rove the single case in which ω =. We roceed to this now. Suose ω =. Since =, the above roof fails, so we must rove L (s 0 directly. What is to our advantage is that we know exactly what ω is; the Dirichlet character for reduces to ω ν(n = ( ν(n. Note that q n if and only if ω ν(n = 0. Also, from this equality, we have that ω ν(n = if and only if ν(n = 2k for some k Z, which is true if and only if n g ν(n g 2k ( g k 2 modulo q, which in turn is true if and only if n is a square modulo q. Negating these statements, we also have that ω ν(n = if and only if n is not a square modulo q. Hence ω ν(n = ( n = q 0, if q n;, if q n and n is a square modulo q;, if q n and n is not a square modulo q. where ( n q is the Legendre symbol.3 In what follows, we define the Gaussian sum G(n, artially evaluate it to get ( n q in terms of G(, quote the value of G(, and substitute the value of G( into the exression for L (. At that oint, we will show that L ( 0. Before continuing, we secify e 2πin/q (for n = 0,,..., q as an arbitrary q-th root of unity, because we will need to address secific roots. We will denote this articular root by e q (n. First, we give the definition of the Gaussian sums. Definition 2. Define the Gaussian sum G(n for all n N by G(n = q m= ( m e q (mn. q 3 The definition and roerties of the Legendre symbol which we will use are discussed in Aendix A

31 Next, we show how to rewrite the Legendre symbols in L (s in terms of the Gaussian sums. Our first ste is to rove the following lemma: Lemma 22. If we let G denote the numeric value of G(, then L ( = G q m= Proof. For n q 0, then n is a unit in Z q, which imlies ( m q n e q(mn (2 n= {m} q m= = {mn}q m=, where these sets are considered as subsets of Z q. In this case, then, since the Legendre symbol is multilicative, ( n G(n = q q m= ( nm q e q (mn = Since ( n q is its own inverse, this imlies that G(n = where we have set G = G(. Since, for n q 0, G(0 = q m= ( m = q q m= q m = ( m q e q (m = G(. ( n G, (22 q ω ν(m = 0 = ( n = q ( n G(, q Equation 22 holds for all n N. As it turns out, G is q if q 4 and i q if q 4 3. Dirichlet finished what Gauss started by showing this. We omit the (clever calculation, which can be found in Davenort, Chater 2. For now, we use the fact that G 0. Then which imlies L ( = ( n = q G ( n= G q m= q m= Swaing the order of summation roves Equation 2. ( m e q (mn, q ( m e q (mn q n. The reader who knows his/her Taylor series well might notice a logarithm hidden in Equation 2. Our next ste is to note this and use it to rewrite our exression for L (. Lemma 23. Following the notation of the revious lemma, we have L ( = G q m= ( ( ( m log 2 sin πm q q ( πm + i q π. (23 2 3

32 Proof. Consider Equation 2. To evaluate the inner sum, consider the Taylor series for the function log( z = n= n zn, which converges for all z excet z =. (See Aendix A.4.4. For z, we have R(z. Since z, we have R(z <, R( z > 0, and hence π/2 < arg( z < π/2. If we use the rincile branch of the logarithm, we will avoid all disgusting discontinuity issues comletely. Write z = e iθ. Thinking of, z, and z as vectors in the comlex lane, we can draw the triangle whose sides are, z, and z. Since z = =, the triangle is isosceles, and the two equal angles α sum to π θ. Extending the line coincident with z, we see that arg( z = α. Therefore, 2 arg( z = 2α = π θ, which gives us arg( z = 2 (θ π. Also, we can calculate z = ( cos θ i sin θ = cos θ 2 2 cos θ = 2 = 2 sin θ 2 2. With the magnitude and the argument of z, we can calculate which gives us n= n einθ = log( e iθ = log z i arg( z, n= n einθ = log(2 sin θ 2 (θ πi. (24 2 Since e q (mn = e q (m n, and since e q (m has magnitude yet is not equal to (since m q 0, we can set θ = 2πm/q in Equation 24 and substitute the result into Equation 2 to obtain Equation 23. Whew! If I had arrived at this equation myself, I would have been sure that I had made a mistake. For L ( is a real number! In no way is it obvious that the exression on the right-hand side of Equation 23 is real. Desite all doubt, however, we have roven this equation, so it is correct. Returning to our goal namely, to rove that L ( 0 we use Equation 23 to comlete Dirichlet s roof in the case that q 4 3. Lemma 24. For q 4 3, L ( 0. Proof. In this case, we have G = i q, so ignoring the (zero imaginary art of Equation 23, we obtain L ( = q q = π q 3/2 = π q 3/2 m= q m= q m= ( πm q π 2 m m 32 ( m + π q 2 q ( m. q q m= ( m q

33 The factor πq 3/2 is certainly not 0, and neither is the sum, for if we set q = 4k + 3 for some k Z, and q m= m ( m q q(q 2 m 2 2 (4k + (2k + 2. q 2 m= To consider an integer modulo 2 is to blur one s vision of the integer as much as ossible without losing sight of it. But in this case, that this sum is congruent to modulo 2 means that it is non-zero, which roves L ( 0, as we wished! Now what about the case where q 4? We rove that L ( 0, as required, but the roof is in a few stes. First, we make a definition. Definition 25. Let R and N range over all m =,..., q such that ( m q = and ( m q =, resectively. (Note: R stands for residues, and N stands for nonresidues. Define the comlex number Q by N Q = sin πn q R sin πr. (25 q Second, we translate our requirement that L ( 0 into the requirement that Q. Lemma 26. If Q, then L ( 0. Proof. Since q 4, G = q, so since L ( is a real number, Equation 23 becomes L ( = q q m= ( ( m log 2 sin πm q q. Now let R and N range over all m {, 2,..., q } such that ( m q = and ( m q =. Then ( ex( ql ( = ex ( log 2 sin πr + ( log 2 sin πn q q R N = ex log = N sin πn q R sin πr q ( N 2 sin πn q R 2 sin πr q, where, to obtain the last equality, we used the fact that the number of quadratic resides is equal to the number of nonresidues (see Aendix A.3.3. But this is equal to Q, so if Q, then ql ( log = 0, which roves L ( 0. Third, we show that Q, which by Lemma 26 roves that L ( 0 for the case where q 4. Lemma 27. In the notation of the revious lemma, Q. 33

34 Proof. Note that R is a quadratic residue modulo q if and only if q R is as well, because q 4 imlies that 4 ( ( ( ( ( ( q R R R R R = = = ( (q /2 =. (26 q q q q q q Hence R R = R (q R, which imlies that 2 R R = R R + R (q R = q R = q q 2. Since the corresonding statement of each of the above statements is true when R is relaced by N, a similar conclusion holds as well. Therefore, R R = N N = q(q 4. This imlies ( q(q e q ( 2i (q /2 8 R sin πr q = e q ( 2 = R R R e q (R/2 R ( 2i sin πr q ( 2i sin πr q = R e q (R/2(e q ( R/2 e q (R/2 = R ( e q (R, (27 and, similarly, ( q(q e q ( 2i (q /2 8 N sin πr q = N ( e q (N. (28 Dividing Equation 28 by Equation 27, we arrive at Q = N( e q(n R ( e q(r. (29 We now aeal to the result Gauss roved: Given an indeterminant x, there exist olynomials Y (x, Z(x Z[x] such that (x e q (R = Y (x qz(x ; (30 2 R (x e q (N = Y (x + qz(x. (3 2 N Evaluating these exressions at x = and substituting the results into Equation 29 yields Q = Y + q/2 Z Y q/2z, where Y = Y ( and Z = Z(. Since Y and Z are real numbers (in fact, integers!, having Z 0 would imly that Q, as we need. 4 See Aendix A.3.3 to read the rules for comuting Legendre symbols 34

35 To rove Z, multily Equations 30 and 3 to obtain ( ( Y 2 (x qz 2 (x = (x e q (R (x e q (N, 4 R which is simly a roduct of x e q (m over all of the q-th roots of unity e q (m which are not equal to. Thus it is further true that N Y 2 (x qz 2 (x 2 = q (x e q (m = xq q x = x m. m= m=0 This middle ste follows because each e q (m and is a q-th root of unity, and the last ste by recognizing this as the sum of a artial geometric sum. Evaluating this exression involving Y (x and Z(x at x =, we arrive at Y 2 qz 2 = 4q. Now q is rime, which means 4q is not a erfect square, which in turn means Z 0, as required! These last two lemmas rove that L ( 0 in the case where q 4, and Lemma 24 roves that L ( 0 in the case where q 4 3. Combining this with our roof of the fact that L ω (s 0 for all comlex ω, we conclude that L ω ( 0 for all ω. Hence, the hyotheses of Lemma 20 are satisfied, from which we deduce Dirichlet s theorem in the case where q is a rime Dirichlet characters and L-functions (general modulus We now move on to the roof of Dirichlet s theorem for a general q. Our first ste, as in the roof of the secial case will be to define the Dirichlet characters for the modulus q. We will define the characters for owers of rimes in two stes, then we will define the characters for a general modulus. Recall that a Dirichlet character, which we will now denote χ(n is a comlex-valued function defined on the ositive integers such that. χ(n + q = χ(n for all n N, 2. χ(mn = χ(mχ(n for all m, n N, and 3. there exists a linear combination of them whose value is if n q a and 0 otherwise. (Note: requirement 3 is usually omitted, but it will be the key roerty of our character construction that will allow us to rove Dirichlet s theorem. To begin the construction, consider the multilicative grous of units in Z q, which we denote by Z q = {n Z q : (n, q = }. We now consider two cases. Suose q = α where is an odd rime, and α N. The construction in this case is arallel to that where q is rime. As we show in Aendix A.3., Z q is cyclic. Let g be a generator of Z q, and define ν(n to be the index of n relative to g that is, define ν(n such that g ν(n = n. Let ω be any φ(q-th root of unity, where φ(q is the order of the grou Z q and is called the Euler totient function. Then the Dirichlet character corresonding to ω is χ(n = ω ν(n for all n such that (n, q = and χ(n = 0 otherwise. (In the case where q is rime, we have χ(n = ω ν(n when 35

36 q n and χ(n = 0 otherwise; this is exactly how we defined χ(n in that case. To comlete the definition, we extend χ(n eriodically so that it is defined for all ositive integers. That χ(n is well-defined, q-eriodic, and comletely multilicative follows by arguments identical to those given for the case where q was rime; one simly uses φ(q in lace of q. That leaves us to show that χ(n satisfy roerty 3, but we will do this for every modulus at the end. We now move on to the case where q is a ower of 2. Suose q = 2 α where α N. For α =, 2, the construction is similar. For α =, q = 2, which means there is φ(2 = character, namely, {, if (n, 2 = ; χ(n = 0, otherwise. For α = 2, q = 4, and there are φ(q = 2 characters, the first of which is when (n, 4 = and 0 otherwise, and the other of which is when n 4, when n 4 3, and 0 otherwise. (You can easily see this! In the notation above, the generator g is 3, which means ν( = 2 and ν(3 =. The square roots of unity are and, from which you can obtain the two characters. For α 3, the situation is not quite as nice, because Z 2 α is not cyclic! What is true, however, is that the subgrou generated by 5 and is the entire grou! (See Aendix A.3.2 for a roof of this fact. That is, Z 2 α = {( a 5 b : a = 0, ; b = 0,,..., 2 α 2 }. Because there are 2 2 α 2 = 2 α = φ(2 α elements in the set generated by owers of and 5, we have that, for all n Z 2 α, there exists a unique air (ν 0(n, ν 0 (n such that n = ( ν 0 5 ν 0. Using (ν 0 (n, ν 0 (n as our substitute for an index ν(n relative to a generating element of Z 2α, we define the Dirichlet characters by { ω ν 0 χ(n = (n (ω ν 0 (n, if (n, 2 α = ; 0, otherwise. where ω is a square root of unity and ω is a (2 α 2 -th root of unity. Here is an examle set of characters for the modulus 8. Examle 28. Let q = 2 3. Then Z q = {, 3, 5, 7}. Since every element in this grou squares to, it is clearly not cyclic. However, we note that 5 0, 5 5, ( 5 0 7, and ( 5 3, so (ν 0 (n, ν 0 (n for n =, 3, 5, 7 is (0, 0, (,, (0,, (, 0, resectively. Now the four characters will corresond to the four combinations of a square root ω of unity and a th root (or square root ω of unity. That is, χ 0 (n = ( ν 0(n ( ν 0 (n = χ (n = ( ν 0(n ( ν 0 (n = ( ν 0 (n χ 2 (n = ( ν 0(n ( ν 0 (n = ( ν 0(n The values of these characters are given here: χ 3 (n = ( ν 0(n ( ν 0 (n = ( ν 0(n+ν 0 (n χ 0 χ χ 2 χ 3 36

37 Interestingly, you might notice that χ 0 χ k = χ k for all k, that χ 3 = χ χ 2, that χ 2 k = χ 0 for all k, and so on. Hence the set of characters forms an abelian grou in which every element has order 2. Since there only one such grou u to isomorhism, the set of characters is isomorhic to the grou Z 2 3. Another interesting note: This grou of characters has order 4 just as the grou of characters defined in Examle 28, but they are not isomorhic grous. The grou in Examle 28, for examle, is cyclic whereas the grou in this examle is not. Proving that each χ(n is well-defined, comletely multilicative, and q-eriodic is an easy extension of the argument for the case where q is rime, so we omit it. We have defined characters now for all rime owers. Given that we want our general characters to be multilicative, this will be sufficient to define them in general. We do this now. Suose q is comosite. In fact, since the following will hold for rime q as well, we could in fact take the definition that follows as the definition of Dirichlet characters for any modulus q. Let q = 2 α n k= α k k. Let χ(n; β denote one of the φ( β Dirichlet characters modulo β, as we have defined above. For each combination of these characters defined for a rime ower modulus, define a Dirichlet character for all n N by χ(n = χ(n; 2 α n k= χ(n; α k k. Note that χ(n = 0 whenever (n, q > because, in this case, (n, β > for some rime factor of n, which imlies that χ(n; β = 0. Does this collection of χ(n give us what we want? Let us see. First, χ is well-defined because each χ(n; β is well-defined. Second, for all β which divide q, the fact that χ(n; β is β -eriodic imlies that it is also q-eriodic; hence, since each term in the roduct is q-eriodic, χ(n is q- eriodic. Third, each χ(n; β in the roduct is comletely multilicative, so the roduct χ(n is multilicative: χ(mn = χ(mn; 2 α n k= = χ(m; 2 α χ(n; 2 α = χ(mχ(n. χ(mn; α k k n k= ( χ(m; α k k χ(n; α k k Thus the collection of the χ(n characters are q-eriodic comletely multilicative comlex-valued functions defined on N. All that is left to show is that some linear combination of these χ(n oututs when n q a and oututs 0 otherwise. We rove this in a sequence of stes. Lemma 29. There are φ(q characters modulo q. Proof. In Section 3..2, we roved that there are q = φ(q characters of a rime modulus q. In an identical fashion, we could rove that there are φ(q characters of rime ower modulus q. Now in the general case, we let q = 2 α α αn n, and let χ(n = χ(n; 2 α χ(n; α χ(n; αn n, 37

38 where χ(n; 2 α is one of φ(2 α characters of modulus 2 α, χ(n; α is one of φ(α characters of modulus α, and so on. This gives us choices. φ(2 α φ( α φ(αn n = φ(2 α α αn n = φ(q We now construct a linear combination which will almost be what we need. Lemma 30. Consider the linear combination l(n = χ χ(aχ(n, which is a sum over every Dirichlet character of modulus q. Then l(n = φ(q if n q a and l(n = 0 otherwise. Proof. Suose first that n q a. Then by the q-eriodicity of χ(n, χ(n = χ(a, which imlies l(n = χ χ(aχ(a = χ χ(a 2. Since χ(a is some ower of a root of unity, χ(a =. Hence, since there are φ(q Dirichlet characters, l(n = φ(q. Now suose that n q a. Let q = α 0 0 α αn n, where 0 = 2. Then for some k = 0,..., n, α k k (n a. Now the sum over all characters χ of modulus q can be rewritten as n + sums over all of the characters χ k, say, of modulus α k k : l(n = χ 0 χ(aχ(n. Now we evaluate the sum over χ k, because we know that α k k (n a. Considering just this art, we have χ(aχ(n = χ k (a χ k (n χ k (aχ k (n. χ k j k j k χ k Now for k > 0, let ω be one of the φ( α k k χ n -th roots of unity. Since (a, q =, a Z α k, and hence k χ k (aχ k (n = ω ν(a ω ν(n = ω ν(a+ν(n = ω ν(a +ν(n = ω ν(a n. Therefore, since the sum over the characters χ k of modulus α k k is sum over the φ(q-th roots of unity, we have χ k (aχ k (n = ω ν(a n = ω ν(a n. χ k χ k ω Now a q n, so a n q, which imlies ν(a n 0, which in turn imlies ω ν(a n is a φ(q-th root of unity besides. Hence the sum on the right-hand side is 0. Summing 0 over the other characters χ j for j k gives us that l(n = 0. Thus, by dividing l(n by φ(q, we obtain the required linear combination: φ(q χ(aχ(n = χ {, if n q a; 0, otherwise. At this oint, we are ready to move on with our roof of Dirichlet s theorem. However, we have a closely related fact about sums of characters which we will use later. So we rove it here. 38

39 Theorem 3. For any q consecutive natural numbers indexed by n, χ(n = 0. n Proof. First, by the q-eriodicity of χ(n, it clearly suffices to show that q n=0 χ(n = 0. Let q = 2 α α αr r. Consider the definition of the Dirichlet character: χ(n = χ(n; 2 α χ(n; α χ(n; αr r. Each χ(n; β corresonds to a unique (r + 2-tule (ω 0, ω 0, ω,..., ω r where ω0 2 =, (ω 0 φ(2α =, and ω φ(α k k k = for all k =,..., r. For each of these roots of unity, there will exist a unique ower of a given rimitive root of unity that will equal the original root. Hence this (r + 2-tule corresonds to a unique (m + 2-tule (m 0, m 0, m,..., m r A where 5 A = Z 2 Z φ(2 α Z α φ( Z φ( αr r, ( m e 2πi/φ(2α 0 ( = ω 0, and e 2πi/φ(α k k mk = ωk for all k =,..., r. ( e 2πi/2 m 0 = ω 0, Hence, using this comlex-exonential reresentation, ( ( χ(n; 2 α = ω ν 0(n 0 (ω 0 ν 0 (n m0 ν 0 (n = ex 2πi 2 and for all k =,..., r. So χ(n = ex ( χ(n; α = ων (n = ex 2πi m kν k (n φ( α k k ( 2πi ( m 0 ν 0 (n + m 0 ν 0 (n r 2 φ(2 α + k= + m 0 ν 0 (n φ(2 α m k ν k (n φ( α k k. Now what haens when we sum over n Z q (where again, Z q here denotes nothing more than the set {0,,..., q }? Now recall that ν k (n, for each k =,..., r, is the unique element modulo φ( α k k such that g ν k(n k n modulo α k k, where g k is a generator of Z α k which we fixed (way back when we. Also, ν k 0(n and ν 0 (n are the unique elements modulo 2 and φ(2α such that defined χ(n; α k k ( ν0(n 5 ν 0 (n n modulo 2 α. So by the definition of the index functions ν 0 (n, ν 0 (n, and so on, onto (ν 0 (n, ν 0 (n is a bijection from Z 2 α onto Z 2 Z φ(2 α, and ν k (n is a bijection from Z α k k Z α φ( k k for all k =,..., r. Hence the function ν(n, defined by is a bijection from onto ν(n = (ν 0 (n, ν 0(n, ν (n,..., ν r (n Z q = Z 2 α Z α Z αr r A = Z 2 Z φ(2 α Z φ( α Z φ( αr r. 5 Here we are abusing the notation slightly. The sets of the form Z n in the Cartesian roduct for A are meant here to secify the set {0,,..., n } and nothing more. 39

40 (We have used to denote the direct roduct of grous, as oosed to, which we used to denote the Cartesian roducts of sets. Therefore, the sum over n involving ν(n can be rewritten as a sum over the values ν = (ν 0, ν 0, ν,..., ν r in the range of ν(n. That is, where q n=0 χ(n = ν A χ ν ( ( m 0 ν 0 r χ ν = ex 2πi + m 0 ν 0 2 φ(2 α + m k ν k φ( α k k= k. (32 Next, we reindex our sum once more. For clarity, denote χ ν by χ ν, m where m = (m 0, m 0, m,..., m r. What we have is a fixed m, and we are summing over ν. But, as is clear from Equation 32, χ ν, m = χ m, ν. So if we swa the the values of ν and m, we obtain χ ν, m. ν A χ ν = ν A χ ν, m = ν A χ m, ν = m A We have (seemingly magically turned this sum into one over the (r + 2-tules m A. But the set of m is in one-to-one corresondence with the set of characters, as mentioned above. Hence this last sum is one over all ossible characters χ. The terms in the sum are obtained by choosing (the unique n {0,,..., q } such that ν(n = ν. So χ ν, m = χ(n = χ(χ(n. m A χ χ But if we aly Lemma 30 where a =, then, since n q, the sum is 0, from which the theorem follows. The next ste in the roof is just like the ste we made in the roof of the secial case. Refer to Section 3..4 to see how our definitions and various stes relate to those made in the roof of the secial case. We begin by defining the Dirichlet L-functions for our general modulus. Definition 32. For q N and a character χ(n of modulus q, define L(s, χ for all s C by χ(n L(s, χ = n s. By Theorem, we have L(s, χ = n= χ( s for all R(s >. Now L(s, χ 0, as we could see by considering the roduct formula for L(s, χ (which we did in Section Therefore, by a calculation similar to the one in Section 3..4, this time using Lemma 30, we obtain log L(s, χ = log( χ( s = m χ( m ms. m= 40

41 Next, we multily by χ(a and sum over all characters χ, arriving at χ(a log L(s, χ = ( χ(aχ( m ms φ(q χ m= χ = m ms, (33 m where the sum over m runs over m N such that m q a. On the right-hand side, we can ull out the m = term to obtain qa + δ, (34 s where δ < ζ(2 for all s >, in the same way as before. On the left-hand side, consider the χ = χ 0 term, where χ 0 is the character corresonding ω =. We have χ 0 (n = for all (n, q = and χ 0 (n = 0 otherwise, which imlies that χ 0 ( s = q s = ζ(s q ( s. Since the finite roduct remains a finite quantity as s +, this diverges to because ζ(s as s +. Thus the χ = χ 0 term on the left-hand side diverges to as s +. If the entire left-side converges to infinity as well, then the sum qa must necessarily diverge, which would imly the existence of infinitely many rimes congruent to a modulo q. Dirichlet s theorem will be roved! All we need to show is that the other terms (the χ χ 0 terms are bounded, which is to say that lim s + L(s, χ (0,. Using continuity arguments similar to those in the roof of the secial case, we see that the limit is bounded and, in articular, equal to L(, χ. Hence our roof requires only that Proving this is the task of the next section Dirichlet s roof (general modulus L(s, χ 0 for all χ χ 0. As in the roof of the case where q is rime, we slit our task into two subtasks. Suose χ is a comlex character. The first is to suose χ is a comlex character; that is, suose that χ(n is comlex-valued for some n N. Then, as in the roof of the secial case, we consider Equation 33, evaluated at a =. (This equation holds for all a which are relatively rime to q; we are trying to rove a roerty about L(s, χ, so the fact that we are setting a = does not somehow restrict ourselves to a secial case. This gives us an analogue to Equation 7. The right-hand side is nonnegative, so we have log L(s, χ 0, from which we conclude that that The rest of the roof continues similarly. χ L(s, χ. χ 4

42 Suose χ is a real character The case where χ is a real character (besides the rincile character χ 0 is challenging (as it was in the secial case where q was rime. We wish to rove that L(, χ 0. To do this, we extend the definition of L(s, χ to all s C. Throughout the rest of this section, let s denote an arbitrary comlex number with real art σ and imaginary art t, so that s = σ + it. By Theorem, L(s, χ converges absolutely for all σ >. We now note a coule of roerties about the function L(s, χ. First, for all δ > 0, the series L(s, χ converges uniformly for σ δ. The roof of this is just like the roof of the fact that ζ(s converges uniformly (see roof of Lemma 8. Since the artial sums of the series for L(s, χ are holomorhic functions, and since we could (again, easily show that the sequence of derivatives of the artial sums converge uniformly, we have the following result: Lemma 33. L(s, χ is holomorhic for σ >. Using this corollary, we rove a crucial roerty of L(s, χ. But first, we make a definition. Definition 34. Let V and U V be subsets of C. Let f : U C be a holomorhic function and F : V C be a holomorhic function such that F (s = f(s for all s U. Then F is an holomorhic continuation of f to V. With this terminology, we state the following imortant roerty: Theorem 35. L(s, χ can be holomorhically continued to σ > 0. Before we give the roof, we note that this theorem holds for χ = χ 0 as well, excet that L(s, χ 0 has a simle ole at s =. The roof, which is identical to the one below excet for a ste at the end, might rovide an exercise for the reader. Proof. Let X(n = n m= χ(m. More generally, define X(x = m x χ(m, where the sum over m runs over all ositive integers less than or equal to x. Then, by Abel s Lemma (see Aendix A.4.3, L(s, χ = lim N n= = lim N = N χ(n n s ( N n= X(n n= ( X(n n s ( n s (n + s (n + s. + χ(n N s Next, we consider the fact that X(x = X(n for all x (n, n +. This allows us to write Now by Theorem 3, L(s, χ = n= n+ n q 0 = χ(n = n=0 sx(x dx = s xs+ 2q n=q χ(n =. X(x dx. xs+ Hence X(x is a bounded function. If M > 0 such that X(x < M for all x >, then σ > 0 imlies that X(x x s+ dx M dx x σ+ = M σ, 42

43 which means this integral converges and takes on a finite value for all σ > 0. Furthermore, using the Leibniz integral rule, d X(x X(x L(s, χ = dx s(s + dx. ds xs+ xs+2 This is clearly convergent for σ > 0, so L(s, χ is holomorhic for σ > 0, as claimed. We are almost there! Here is one more quick lemma: Lemma 36. L(s, χ 0 has simle ole at s =. Proof. We showed in the roof of Lemma that (s ζ(s as s. (Secifically, we knew that the limit in Equation 6 was finite, from which we concluded that (s ζ(s converged to 0 as s. Hence ζ(s has a simle ole at. But we can write L(s, χ 0 = χ 0 ( s = q s = ζ(s q ( s. Since only finitely many rimes divide q, the roduct on the right will be finite if and only if ζ(s is. Hence L(s, χ 0 as s + because ζ(s in the same limit, and (s L(s, χ 0 remains finite as s + because (s ζ(s does. Hence L(s, χ 0 has a simle ole at s =. The roof of the theorem which we have reduced to the claim that L(, χ 0 for all real χ χ 0 is in two stes. We define a function ψ(s on C and rove two roerties of it in the following two lemmas. Dirichlet s theorem will follow immediately, as we show below. Lemma 37 (The first ψ(s lemma. Define the function ψ(s for all s C by If L(, χ = 0, then ψ(s 0 as s 2 +. ψ(s = L(s, χl(s, χ 0. L(2s, χ 0 Proof. By assumtion, L(s, χ has a zero at s =. So by Lemma 36, L(s, χl(s, χ 0 is holomorhic at s =. By Theorem 35, L(s, χl(s, χ 0 is, in fact, holomorhic for σ > 0. Define ψ(s for s C by ψ(s = L(s, χl(s, χ 0. L(2s, χ 0 Note that L(2s, χ 0 is holomorhic for all σ > 2. So ψ(s is holomorhic for all σ > 2. Also, because L(s, χl(s, χ 0 is holomorhic at 2, it is bounded at that oint; hence, since L(s, χ 0 as s 2 from the right, we have that ψ(s 0 as s 2 from the right. Lemma 38 (The second ψ(s lemma. Given ψ(s as defined in the revious lemma, ψ(s for all 2 < s < 2. Proof. As we have seen reviously, we can write L(s, χ 0 = q s. 43

44 So L(2s, χ 0 = q 2s. Now consider the roduct for L(s, χ = ( χ( s. If q, then χ( = 0 (just as it is in the roduct reresentation for L(s, χ, so L(s, χ = q Hence ψ(s has the following roduct reresentation: ψ(s = q χ( s. 2s ( χ( s ( s. If χ( = for some in this roduct, then the -term is clearly. Hence we have ψ(s = χ(= 2s ( s 2 = χ(= + s s, where the roduct is understood to be over all rime such that χ( =. Now we recognize ( s as the sum of a geometric series with ratio s. Hence ψ(s = ( + ( ( ms s = ( + 2 ms. (35 m=0 χ(= m= χ(= If we wrote out this roduct as a series, every term would be of the form a n n s for some a n C and n N. (If there are no terms in this roduct, which could conceivably haen if χ( for all rime q, then the emty roduct would equal, in which case our claim holds trivially. Let ψ(s = n= a n n s. (36 By Equation 35, we see that a n 0 for all n N and that a =. Next, we consider another series reresentation of ψ(s, the Taylor series reresentation. Since L(s, χl(s, χ 0 is holomorhic for σ > 0 and L(2s, χ 0 is holomorhic for σ > 2, their quotient (that is, ψ(s is holomorhic for σ > 2. Hence if we center the Taylor series at 2, writing ψ(s = m=0 ψ (m (2 (s 2 m, (37 m! then the Taylor series converges for all s 2 < 3 2. We now calculate ψ(m (2 using the Dirichlet series, and substitute the result into the Taylor series, which will then rovide us with the desired estimate. 44

45 Using Equation 36, we have ψ (s = s n= ψ (s = s(s +. a n n s+ ; n= a n n s+2 ψ (m (s = ( m s(s + (s + m as a simle induction argument would show. Hence ψ (m (2 = ( m (m +! n= m= a n n 2+m. a n n s+m, If we let b m = (m + n= a nn (2+m, then ψ (m (2 = ( m m!b m. Substituting this into Equation 37, we have ψ(s = ( m b m (s 2 m = m=0 b m (2 s m. Now the fact that a n 0 for all n N imlies that b m 0 for all m = 0,,.... Also, b 0 = a =. Hence for all s ( 2, 2, (2 sm 0, from which we conclude as desired. ψ(s m=0 b m (0 m = b 0 =, m=0 We have made it to the to of the mountain, and we are ready to make the last ste. Let us do so! Theorem 39 (Dirichlet s Theorem on Primes in Arithmetic Progressions. Let (a, q =. Then there exist infinitely many rimes of the form a + qn where n N. Proof. By what we have shown, all we need to do is show that L(, χ 0 where χ χ 0 is a real character. We roceed by contradiction. Assume L(, χ = 0. Define the function ψ(s as in Lemmas 37 and 38. Then by Lemma 37, ψ(s 0 as s 2 +. But by Lemma 38, the limit of ψ(s as s 2 + if it even exists is greater than or equal to. Hence we have a contradiction, and the theorem is roved. To state it once more, every residue class modulo q, in which each element is relatively rime to q, has infinitely many rimes. 45

46 3..6 A robabilistic interretation of Dirichlet s theorem Pick any air of relatively rime integers a and q, say, and We have just roven that there are infinitely many rimes of the form , , , ,.... This is amazing! The robability of randomly choose a rime between n 0 00 and (n is aroximately number of rimes in range number of integers in range π((n π(n = = = (n log((n n 000 log(n (n + log n n log(n + 00 log 0 log n log(n + (n log n n log n + + log n. 00 log 0 log n log(n + This clearly goes to 0 as n. So, in articular, the robability that + n 0 00 is rime goes to 0. One interretation of Dirichlet s theorem is that this robability does not go to 0 too quickly. For if it did, then after testing whether + n 0 00 is rime for all n N, the exected number of rimes found might be finite. But Dirichlet s theorem imlies that the exected number which is equal to the actual number since what we have is really a deterministic sace of outcomes is infinite! The receding discussion is the same if we relace 0 00 with any q (besides and with any a which is relatively rime to q. The urose of choosing articular (large values is to exress how amazing this fact is. We have roven that there exist infinitely many rimes, and we have given an asymtotic exression for how dense they are in the N. Now that we have roven that there are infinitely many rimes in (certain arithmetic rogressions, we would like to give a related asymtotic exression for their density in N. 3.2 The Generalized Prime Number Theorem We have rovided answers (with roof! to three of the four questions osed in the introduction: Q: How many rimes are there? A: Infinitely many! (Proofs due to Euclid and Euler. Q: How dense are they? A: The PNT says π(x x/ log x. Q: Given an arithmetic rogression (with (a, q =, how many rimes are there? A: Infinitely many! (Proof due to Dirichlet. Our last question is: Given such an arithmetic rogression, how dense in N are the rimes in that rogression? What function might tell us the aroximate number of rimes less than some fixed number in a given arithmetic rogression? To begin, the actual value of the function must be bounded above by π(x, for there cannot exist fewer rimes less than x than rimes of a articular kind less than x. Let π a,q (x denote the number of rimes less than or equal to x which are congruent to a modulo q. Now given an rime, corresonds to a unique residue class 46

47 modulo q. Since the could only show u in a residue class whose members are relatively rime to q, we must have that π(x = π a,q (x, a Z q where the notation in the sum indicates a sum over all integers 0,,..., q which are relatively rime to q. Beyond this, we are left to conjecture: what is π a,q (x? One easy guess would be that π,q (x = π(x and that π a,q (x = 0 for all a. But this does not seem to be the case for small q. For examle, when q = 4, there are rimes congruent to (such as 5, 3, and 7 as well as rimes congruent to 3 (such as 7,, and 9. In fact, for small q, it aears that all residue classes contribute rimes. Then next natural conjecture might be that each residue class contributes equally. Since π a,q (x, then, would not deend on a, we could write this function (aroximately as π q (x. Furthermore, since there φ(q relatively rime residue classes modulo q, we would have that π(x = π a,q (x π q (x = π q (xφ(q. a Z q a Z q Hence our conjecture is that π q (x φ(q π(x φ(q x log x. Because this is a statement of the Prime Number Theorem when q = 2, we call this (for our uroses the generalized Prime Number Theorem, or gpnt. The roof of the gpnt given in [So] is arallel ste for ste to Zagier s resentation of Newman s roof. I omit it here, and instead discuss why we have a good reason to be otimistic about our conjecture. We rove two lemmas here, then we rove a theorem, which follows trivially from the gpnt and does not quite rove the gpnt, but it has indeendent interest. This discussion follows that in [C]. Lemma 40. As s +, Proof. From Equation 3 that ( ( s log. s log ζ(s = s + δ(s, where δ(s < ζ(2. We also know from the roof of Lemma that (s ζ(s as s +. Hence s = log((s ζ(s + log s δ(s = log s + δ (s, where δ (s remains bounded as s +. Dividing by log(s and taking s + roves the lemma. The next lemma is similar. Lemma 4. As s +, qa s ( log. s 47

48 Proof. From Equations 33 and 34, we have that φ(q log L(s, χ 0 + φ(q χ χ 0 χ(al(s, χ = qa + δ(s (38 s where again δ(s remains bounded as s. Now we roved that the sum over χ χ 0 remains bounded as s + that was the hard art about roving Dirichlet s theorem! Furthermore, we roved in Lemma 36 that (s L(s, χ 0 is bounded as s +. From our roof it is also clear that (s L(s, χ 0 does not converge to 0 as s +. Hence log((s L(s, χ 0 remains bounded as s +. Rewriting Equation 38 and combining the terms which remain bounded as s +, we obtain φ(q qa s = log((s L(s, χ 0 + log s + χ(al(s, χ δ(s χ χ 0 = log s + δ 2(s, where δ 2 (s remains bounded as s. Dividing by log ( s and taking s +, we obtain our lemma. When we combine Lemmas 40 and 4, we obtain the following result: Theorem 42. For relatively rime integers a and q, the rimes in the arithmetic rogression {a + qn} n= are evenly distributed in the following sense: qa s s φ(q as s +. (39 Hence the rimes really are, in some sense, evenly distributed among the residue classes. 48

49 A Aendix Given the length of the main roofs in this exercise, I felt that it was aroriate to ut off roving some of the facts I used until later. Thus we have this aendix. Some of the sections are urely technical, and others are definitional. There are some gems to be found here, however, such as the theorems roven in Sections 49 and A.4.4. A. Comlex Integration and Differentiation In this section we give some definitions which are required to state two imortant theorems from comlex integration theory, Cauchy s Residue Theorem and the Deformation Invariance Theorem, both of which we use earlier. The definitions and roofs can be found in [Sa]. Also, we discuss the methods by which one roves that a function is differentiable. A.. Comlex Integration A domain D is an oen subset of C that is also connected. A contour C (for our uroses is a continuous arameterizable curve in a domain D that is differentiable excet ossibly at finitely many oints; that is, C is a curve for which there exists a continuous bijection s(t from [0, ] onto D such that s (t exists for all t [0, ] \ A where A is a finite subset of [0, ]. Finally, a contour integral C f(sds is defined just as a line integral is in multivariable calculus; we let s(t be a arameterization of C and write C f(sds = n i=0 ti+ t i f(s(ts (tdt and integrate, where t 0 = 0 and t n =, and where t,..., t n are the oints (if there are any at which s(t is not differentiable. Now line integrals over R 2 and contour integrals over C are not entirely the same; in fact, our analogies end here: the key difference is related to the notion of homotoy. Two contours C and C 2 are homotoic in a domain D if there exists a continuous function s(s, t such that C = {s(0, t : t [0, ]}, C 2 = {s(, t : t [0, ]}, and s(s, t D for all (s, t [0, ] [0, ]. With this definition we have the following: Theorem 43 (Deformation Invariance Theorem. If f(s is holomorhic on a domain D, and if C and C 2 are homotoic contours in D, then C f(sds = C2 f(sds. I omit the roof, as it is roven in most introductory courses on comlex analysis, including the one taught at Kenyon. Theorem 43 is so far from true for line integrals in R 2, that it seems imossible that it is true for contour integrals in C. The difference only becomes clearer as one learns more about integration over C. Another amazing roerty of contour integrals in C is Cauchy s Residue Theorem. To state it, we need a few more definitions. A contour is simle if the arameterization s(t of it is one-to-one on [0,. A closed contour is one for which the ends of the contour meet: s(0 = s(. A simle closed contour, therefore, is a loo in the comlex lane that does not cross itself. Finally, a ositively oriented contour is less easily defined. For our uroses, such a contour is defined as 49

50 a simle closed contour such that s(t moves in a generally counterclockwise direction around the contour as t increases. Finally, if f(s has a ole of order j at s j, then the residue of f(s at s j is Res(f, s j = lim s sj (m! d m ds m ((s s j m f(s. This large exression is quite simle in the case that f(s has a simle ole that is, a ole of order at s j, because in this case, we have Res(f, s j = lim s sj (s jf(s. With these definitions, we state this amazing theorem. Theorem 44 (Cauchy s Residue Theorem. Let D be a domain in which f(s is holomorhic excet at the oints s,..., s n, and let C be a simle closed ositively oriented contour in D containing some of these oints, say, w,..., w m. Then C f(sds = 2πi m Res(f, w k. If there are no such oints w k (or s k, then the value of the integral is 0. Once again, we omit the roof, as it is on the syllabus of most introductory comlex analysis courses. A..2 Comlex Differentiation The roerty of function which is comlex-differentiable at a oint s goes by many names: comlexdifferentiable at s, analytic at s, holomorhic at s, and regular at s. We use the third otion. There are many ways one can rove that a given function f is holomorhic, some of which are quite obvious: show f = g + h or f = gh for holomorhic functions g and h, etc. Here is a method which is useful for the urose of this exercise: Theorem 45. If f n is a sequence of holomorhic functions such that f n f uniformly and f n g uniformly, then df ds = g. That is, d ds lim f df n n = lim n n ds. One surrising fact about comlex differentiation is the following: Theorem 46. If a function is holomorhic at s, then derivatives of all orders exist at s. This is not true about real differentiation! The existence of the first derivative does not imly the existence of the second derivative! Some facts in comlex analysis seem too good to be true but they are! This one in articular was used multile times in this exercise. A.2 The Leibniz Integral Rule We rove the following theorem, which has theoretical uses (see the roof of Theorem 5 and ractical uses (as an integration tool!. It is stated as follows: Theorem 47 (The Leibniz Integral Rule. Let a(s and b(s differentiable real-valued functions on C, and let f(s, t be a comlex-valued function defined on C R. Assume that a(s, b(s, and k= 50

51 f(s, x are differentiable with resect to s, that f(s, x is continuous with resect to x, and that f(s, x is integrable with resect to x. Then s d ds b a f(s, xdx = b a db f(s, xdx f(s, ada + f(s, b s ds ds. Proof. Let I = b a f(s, xdx. Regard I as a function of a, b, and s. Then note three things:. Since f(s, x is continuous in x, the Fundamental Theorem of Calculus imlies that 2. Similarly, we have I a = a a b I b f(s, xdx = f(s, a. = f(s, b. 3. Fix some s. In the case that a(s = b(s, it follows trivially that s I(s, a, b = s b a f(s, xdx = 0 = b a f(s, xdx. (40 s Now suose that a(s b(s. Let ɛ > 0. Since f(s, x is differentiable with resect to s, there exists δ > 0 such that f(s, x f(s, x s s s f(s, x < ɛ b a whenever s s < δ. Fix such a δ, and let s C such that s s < δ. Then is equal to b a b a f(s, x f(s, x b s dx f(s, xdx s a s ( f(s, x f(s, x s s s f(s, x dx, which is less than or equal to b f(s, x f(s, x s s s f(s, x dx Thus we conclude that lim s s a lim s s b a b a ɛ dx = ɛ. b a f(s, x f(s, x b s dx = f(s, xdx. s a s Now the left-hand side of this exression is equal to ( b a f(s, xdx b a f(s, xdx s s So Equation 40 holds for a(s b(s and, therefore, always. I(s, a, b I(s, a, b = lim s s s = I(s, a, b. s s 5

52 So the artial derivatives of I(s, a, b with resect to s, a, and b exist. Therefore, since a(s and b(s are differentiable, the we can aly the chain rule to differentiate I(s, a, b with resect to s: This roves the theorem. di ds = I s + I da a ds + I db b ds b db = f(s, xdx f(s, ada + f(s, b s ds ds. a In the case that a(s and b(s are constants, then a (s = b (s = 0, and we have the (more common secial case: Corollary 48. If f(s, x is a comlex-valued function defined on C R such that f(s, x is differentiable with resect to s, f(s, x is continuous with resect to x, and sf(s, x is integrable with resect to x, then d b b f(s, xdx = f(s, xdx. ds s A.3 Useful algebraic concets and facts a We rove two grou-theoretic facts in the first two sections of this aendix, and in the third section we define the Legendre symbol, and give the method one uses to comute them: quadratic recirocity. The first two sections follow [A], and the third section follows [L]. I must make a cautionary note here: my roofs which follow [A] are quite close to the originals, which are very beautifully conveyed. Though I do not feel as though I have done justice to the exosition in [A], I have nevertheless shamelessly included the roofs here. My reasoning was not, in this case, because I added anything to the original resentation; rather, this inclusion ix both for comleteness and for myself. By working through these theorems, I understand two imortant theorems from grou theory which I would otherwise not know. A.3. The multilicative grou Z α is cyclic for all odd rime We rove that Z α is cyclic for all odd rimes and α N. In fact, Z 2, Z 4, and Z 2α (for odd rimes and α N are also cyclic; interestingly, these are the only other cyclic grous of the form Z q. All of this is roven in [A], but we only rove here what we need for this exercise. We start by roving that Z is cyclic for all rimes. Lemma 49. Let be an odd rime. Then Z is cyclic. Proof. We assume two facts from elementary grou theory: that the order a k of a k is equal to a / gcd( a, k and that d n φ(d = n, where the sum is over the ositive divisors of d and φ(d is (as usual the Euler totient function. For each d, define A(d = {x Z : x = d}, where x denotes the order of x in Z. Since every element x Z has a unique order which divides Z = φ( = (by Lagrange s theorem, the collection of A(d artitions Z. Hence d A(d =. From our assumed fact from grou theory, we have (φ(d A(d = 0. (4 d a 52

53 We claim that φ(d = A(d for all d. To obtain this from Equation 4, we rove that either A(d = 0 or A(d = φ(d. This would imly that φ(d A(d 0 for all d, in which case we have equality in Equation 4 if and only if φ(d = A(d for each d. So fix some d and suose A(d = 0. Then choose some a A(d. Then a = d, so a d and no smaller ower of a is congruent to modulo. Hence a, a 2,..., a d are d distinct elements, all of which satisfy x d 0. Since the olynomial x d has as most d solutions in Z, every solution is of the form a k for some k =,...,. Therefore, since every x A(d satisfies x d 0, A(d is a subset of the set generated by a. We can exress this as A(d{a k : k =,..., d; a k = d}. But a k = a / gcd( a, k = d/ gcd(d, k, as one roves in elementary grou theory, so a k = d if and only if gcd(d, k =. So A(d is a set, indexed by all k =,..., such that gcd(k, d =. Hence A(d = φ(d, as claimed. What does this have to do with roving that Z is cyclic? Well, A(d = φ(d for all d =,...,, so, in articular, A( = φ(. Since is relatively rime to every natural number, φ(, and since A(, the set of elements in Z with order, has at least one element, there exists an element in Z with order. This element, having order must generate a subgrou of Z with = Z elements; hence, this element generates Z, and the roof is comlete. Next we rove another lemma, which is of little interest besides the fact that we need it: Lemma 50. Let be an odd rime. If g is a generator of Z such that g 2 α 2, g φ(α α. Proof. Let g be as given. We roceed by induction on α. For α = 2,, then for all g φ(α 2 g φ( 2 g 2, so the base case holds. Now suose g φ(α α for α. Considering g as an element of Z α, we have by Lagrange s theorem that g divides φ( α. So g φ(α α. Translating this congruence into a fact that holds in Z, we have g φ(α = + k α (42 for some k Z. But by our induction hyothesis, k because g φ(α α crucial, as we will see in a moment. Using Equation 42, g φ(α. Since. That k is φ( α = α ( = ( α 2 ( = φ( α, we have g φ(α = ( g φ(α = ( + k α. Exanding this binomial yields g φ(α = + k α + 2 ( (kα ( ( 2(kα (k α ( = + k α + k 2 2α + 3α 3 t 2 53

54 for some integer t. Now α 2, so 2α and 3α 3 are both at least α +. Hence α+ divides both 2α and 3α 3, which imlies g φ(α α+ + k α. Recalling the key roerty about k, namely, that k, we have that g φ(α α+, and the induction is comlete. We now rove the main theorem of this section: Theorem 5. Let be an odd rime, and let α. Then the following hold: (a If g is a generator of Z, then g generates Z α if and only if g 2. (b One of the generators of Z satisfies g 2. (c There exists a generator of Z α; that is, Z α is cyclic. Proof. We rove (a first. By Lemma 49, Z is cyclic. Let g be a generator of Z. If g in fact generates Z α for all α, then g generates Z, which imlies g 2 2 because < 2. Now suose that g 2. For α =, then, g generates Z α = Z. So consider α 2. We need to show that g generates Z α or, equivalently, that g = φ(α, where g is the order of g in Z αẇe have that g g α, so g g. Since g is a generator of Z, g g imlies that g 0 modulo φ( =, the order of Z. So φ( divides g. Write g = mφ(. By Lagrange s theorem, g divides φ( α, the order of Z α, so mφ( φ(α. Since φ( = and φ( α = α (, this imlies m( α (. If we rewrite this as α ( = m m( for some m Z, we see that m α. So m = β for some β = 0,,..., α. We will be done if we can show that β = α, for this would imly that g = mφ( = β ( = α ( = φ( α. We roceed here by contradiction. Secifically, assume that β α 2. Then g = β ( divides α ( = φ( α, which imlies that g φ(α α. Since α 2, since g generates Z, and since g 2 by assumtion, Lemma 50 imlies that we have a contradiction. Hence β = α, which is sufficient, as we roved above. To rove art (b, we let g be a generator of Z. (Again, a generator exists by Lemma A.3.. If g 2, then g generates Z α by (a. If, on the other hand, g 2, then (g + 2, as we will show. Since g + g, g + is a generator of Z, so (a would imly that g + generates Z αṫo show that (g + 2, consider that (g + = g + ( g ( ( 2g Now if we reduce this modulo 2, then the right-hand side becomes + ( g 2. So we have (g + 2 ( g 2. Since g generates Z, we have that g 2, from which we conclude that 2 ( g 2. Hence (g or, equivalently, (g + 2, as desired. Part (c clearly follows from (a and (b, which concludes the theorem. A.3.2 The multilicative grou Z 2α for n 3 is generated by 5 and. We rove the useful fact that, although Z 2 α is not cyclic, it is generated by the elements 5 and. To do this, we first show that the order of 5, denoted by 5, is 2 α 2. 54

55 Since 5 divides the order φ(2 α = 2 α of Z 2 α, 5 = 2 β for some β =, 2,..., α. Now write 5 = and consider that So 5 2β = ( β = + 2 β β (2 β ( β+. 5 2β = 2 β β+3 r = 2 β+2 (2r + for some integer r. Now since 5 = 2 β in the grou Z 2 α, we have 5 2β modulo 2 α. So 2 α (5 2β, from which we conclude that 2 α 2 β+2 since 2r + is odd. Hence α β + 2, or β α 2. To conclude that β = α 2 (as oosed to β = α, we note that β = α would imly that 5 = 2 α, which is the order of the grou, which in turn would imly that 5 generates the grou. But no ower of 5 is congruent to 3 modulo 2 α since all owers of 5 are congruent to modulo 4 whereas 3 4. Thus 5 = 2 α 2, and hence {5, 5 2,..., 5 2α 2 } is a collection of 2 α 2 distinct elements. Now consider the collection { 5, 5 2,..., 5 2α 2 }. Since negation is a one-to-one function, this is another collection of 2 α 2 distinct elements. Moreover, the first collection is made u of elements congruent to modulo 4 whereas the second collection is made u of elements congruent to, or 3, modulo 4. Hence the union of these two collections, which is the set generated by 5 and, is a set of 2 2 α 2 = 2 α distinct elements. Since Z 2 α has exactly this many elements, we conclude 5 and generate Z 2 α. A.3.3 The Legendre Symbol Given a rime, consider the grou of units Z. For examle, let = 7. We might ask which of the elements in Z 7 are squares in Z 7, or quadratic residues modulo 7. Comuting, we see that, 2 2 4, 3 2 2, 4 2 2, 5 2 4, and 6 2. So, 2, and 4 are called quadratic residues modulo 7, and 3, 5, and 6 and called quadratic nonresidues modulo 7. Note that there are equally as many quadratic residues as nonresidues. As it turns out, this is always true when Z q is cyclic but q 2 or 4: Theorem 52. If Z q is cyclic but not Z 2 or Z 4, then exactly half of the elements are squares. Equivalently, if q = α or 2 α where is and odd rime, then there are φ(q/2 quadratic residues (and therefore φ(q/2 nonresidues modulo q. Proof. Since Z q is cyclic, there exists a generating element g such that Z q = {g m } φ(q m= = {g2m } φ(q/2 m= {g 2m+ } φ(q/2 m=. From this it is clear that at least half of the elements are squares. On the other hand, x 2 ( x 2 for all x Z q, so if x x for all x Z q, then at most half of the elements would be squares. Hence exactly half would be squares in Z q. But if x q x for some x Z q, then 2x q 0, which imlies that q, which is α or 2 α, divides 2x. Hence we would have that, an odd rime, would divide x. But this would imly that x Z q, a contradiction, so the theorem is roved. Legendre came u with some handy notation for dealing with quadratic residues: Definition 53. For all odd rimes, define the Legendre symbol ( n for all n N by ( n =, if n is a quadratic residue modulo ;, if n is not a quadratic residue modulo ; 0, if n. 55

56 Note from this definition that ( n2 = = ( n 2 for all n which are not divisible by, and that, if n, then ( n2 = 0 = ( n 2. So ( n2 = ( n 2 for all n Z. As it turns out, a stronger fact holds: ( mn = ( m ( n for all m, n N. We enumerate this and a few other imortant roerties of the Legendre symbol here: Theorem 54. Let and q be odd rimes, and let m and n be integers. Then the following are true of the Legendre symbol: ( ( (. mn = m n. ( ( 2. m+ = m. 3. = ( ( /2. 4. ( ( q ( q = ( ( (q /4. Proof. We gave a roof of above for the secial case of m = n; we refer the reader to [L] or [A] for a roof of the general case. The second fact follows because the Legendre symbol is comuted using information about m + and m as elements in Z. But m + and m are the same elements in Z, so the Legendre symbol involving each is the same. The third and fourth oints, or sometimes just the fourth oint, are called the quadratic recirocity law. I read my first roof of this fact in [Da], but others are found in [L] and [A]. Using this theorem, it is clear how we could comute the Legendre symbol of any odd integer n modulo. We would first factor the integer and use the multilicativity to rewrite ( n as a roduct of symbols ( q over all (odd rimes q. If n is negative, we would use art 3 to calculate the ( term. For the remaining terms, we could use art 4 to fli each ( q for which q < into ±( q so as to make the comutation easier. Finally, one starts squaring elements in Z and other various Z q until all of the symbols are comuted. To see an examle, see our calculation in Equation 26. A.4 Carefully considered convergence issues A general rule I have learned and now live by at least when I do mathematics is, be careful, no, be very careful when you see infinities in your mathematics. To be sure, we have been very careful this entire time. Only at a few oints in this exercise have we taken a sliery ste or two. To give some traction to these stes, I have added this aendix. A.4. Infinite roducts Although we see infinite sums (which we call series in our calculus courses, we seldom see infinite roducts. In contrast to a first or second calculus course, infinite roducts abound in comlex analysis. Convergence issues are translated into convergence issues about series, which we understand very well. This aendix contains a brief overview of infinite roducts and their convergence roerties. The resentation follows [Ah]. An infinite roduct P = n= n is defined as the limit of the artial roducts P n = n k= k. This is similar to the definition of an infinite sum. However, there is an added technical requirement. An infinite roduct is convergent if at most finitely many n are zero, and if the roduct of the nonzero terms converges to something finite and nonzero. There are good reasons for these 56

57 technicality. For examle, a series has just one zero, all of the artial roducts (ast a certain oint are zero no matter what the other terms in the roduct are! Note that, for a convergent roduct, n = P n /P n as n. With some thought, we would exect this; it is analogous to the fact that the terms in a convergent series converge to 0. Hence it makes sense to let n = + a n for all n N. Consider a convergent roduct in which every n = + a n 0. We can consider log ( + a n = n= log( + a n. (43 n= (To rove this equality, one considers the artial roduct from to N, uses the corresonding roerty of logarithms of finite roducts, takes N, and uses the fact that logarithms are continuous functions. As it turns out, the infinite roduct converges if and only if the series converges. In this way, the roblem of testing for convergence in an infinite roduct is reduced to testing for convergence in a series. An infinite roduct absolutely converges if the corresonding, given in Equation 43, absolutely converges. Since the series involves logarithms, testing for absolute convergence might be difficult. Fortunately, we have the following theorem: Theorem 55. An infinite roduct n= ( + a n converges absolutely if and only if the series n= a n converges absolutely. With these definitions of convergence being so closely tied to the definitions of convergence for a series, statements about termwise differentiability of series, for examle, are easily translated translated into statements about termwise differentiability of roducts. A.4.2 Combining absolutely convergent series If the sums k a k and k b k are finite sums over the same index k, then (a k + b k = a k + b k. k k k We would like to say the same about series: Theorem 56 (Adding absolutely convergent series. Let k= a k and k= b k be absolutely convergent series. Then (a k + b k = a k + b k, and this converges absolutely. k= Proof. For all n N let r n = n k= a k, s n = k= b k, and t n = n k= (a k + b k = r n + s n. By hyothesis, r n and t n converge as n, so On the other hand, this limit is equal to k= k= lim (r n + s n = lim r n + lim s n = n n n lim t n = n (a k + b k. k= 57 a k + b k. k= k=

58 So the first art of the theorem is roved. To show that the convergence is absolute, consider that a k + b k k= ( a k + b k k= which is finite by the theorem s hyothesis. a k + b k, Next, we consider whether we can multily series in the natural way. We rove the following theorem: k= k= Theorem 57 (Multilying absolutely convergent series. The roduct equality ( a j b k = b k a j j= k= j= k= (44 holds and the sum on the left-hand side converges absolutely. Proof. Using the same notation for r n and s n, and letting t n = n j= k= the fact that r n and s n converge as n imlies j= k= a j b k = r n s n, ( ( a j b k = lim t n = lim r n lim s n. n n n This roves Equation 44. To rove absolute convergence, consider that, for all n N, ( n n n n ( a j b k = a j b k a j b k. j= k= j= k= The absolute convergence of the sums r n and s n imly that the right-hand side is finite. Hence the left-hand side is bounded for all n N, which roves that the limit as n is finite. A.4.3 The Dirichlet test for convergence We first rove a finite analogue to integration by arts. Our functions u and v will be the sequences defined on the nonnegative integers. Theorem 58 (Abel summation. If (a k and (b k are sequences of comlex numbers, and if (A k is the sequence of artial sums of the terms in (a k, that is, A n = n k=0 a k, then for all n N. j= n n a k b k = A k (b k b k+ + A n b n k= k=0 k= 58

59 Proof. We roceed by induction on n. For n = 0, the sum on the right-hand side is emty, so the base case is trivial. Now suose the equality holds for n. Then, using our induction hyothesis right away, adding and subtracting A n (b n b n+, and relacing A n + a n+ by A n+, we obtain n+ a k b k = k=0 = = = = n A k (b k b k+ + a n+ b n+ k=0 n A k (b k b k+ + A n b n + a n+ b n+ k=0 n A k (b k b k+ A n (b n b n+ + A n b n + a n+ b n+ k=0 n A k (b k b k+ + (A n + a n+ b n+ k=0 (n+ k=0 A k (b k b k+ + A n+ b n+. We recognize this as our claim for n +, so the result follows by induction. By setting a 0 = a = = a m = 0, we have that A 0 = A = = A m = 0, from which we derive the following corollary: Corollary 59. Let (a n, (b n, and (b n be as given in Theorem 58. Then for all n m. n k=m+ a k b k = n k=m+ A k (b k b k+ + A n b n The reader should now recall the alternating series test for the convergence of series. It says that if c n = ( n b n for all n N, where the b n are decreasing and converging to 0 as n, then the series n= c n converges. The theorem below is a generalization of this test. In short, it states that the ( n in the definition of c n can be relaced by any sequence a n as long as the artial sums n k= a k remain bounded as n. Theorem 60 (Dirichlet s test for convergence. Let (a n and (b n be sequences in R such that n there exists M R such that a k M for all n N, and b n decreases and converges to 0 Then the series a n b n converges. n= k= Proof. Since R is comlete, roving that the series converges is equivalent to showing that it is Cauchy. Since we do not know (or care what the oint of convergence is, it will be easiest to rove that the sequence is Cauchy and use the comleteness of R to conclude the theorem. 59

60 To this end, let ɛ > 0. Since the artial sums are bounded, let M > 0 such that A n M for all n N. Now b n 0, so there exists N N such that b n < ɛ/(3m for all n > N. Also, the sequence (b n is necessarily Cauchy (because it converges!, so there exists N 2 N such that n m > N 2 imlies that b m b n = b m b n < ɛ/(3m. Choose N N to be larger than N and N 2. Then for all n m > N, using Abel s summation lemma, we have n a k b k k=0 m a k b k = k=0 n m A k (b k b k+ + A n b n A k (b k b k+ A m b m k=0 k=0 n A k (b k b k+ + A nb n + A m b m k=m n M b k b k+ + M b n + M b m. k=m Now (b n is decreasing, so n k=m b k b k+ = n k=m (b k b k+ = b m b n < ɛ/(3m by our choice of N. Also by our choice of N, b n and b m are less than ɛ/(3m. So n m ( ɛ ( ɛ ( ɛ a k b k a k b k M + M + M = ɛ. 3M 3M 3M k=0 k=0 Hence the artial sums form a Cauchy and, therefore, convergent sequence. This is what it means for the series to converge, so our roof is comlete. A.4.4 The Taylor series for log( z converges for all z such that z That this series converges for z < is obvious by comarison the geometric series z n. convergence for z = follows from Dirichlet s test for convergence since The. n decreases and converges to 0 as n, and 2. n= zn = n= eiφn, for some 0 < φ < 2π, and the series on the right is bounded. The roof of the second oint shall be my final treat for the reader. We do this by roving the following theorem: Theorem 6. Let θ >. Then is bounded. n= e 2πin/θ Proof. First note that if θ is an integer, then the sum of any consecutive θ terms which are θ-th roots of unity is 0. Intuitively, then, for general θ, the sum of θ consecutive terms should be about 0. So to rove this fact, we first consider artial sums u to integer multiles of θ and 60

61 break the sums u into blocks of θ terms. That is, for all m N, m θ n= e 2πin/θ = = = m (j+ θ j=0 n=j θ + m θ j=0 n = m j=0 e 2πin/θ e 2πin /θ e 2πij θ /θ (e 2πi θ /θ j θ n = (e 2πi/θ n, where we have reindexed the inner sum by the variable n = n j θ. Note that, for θ Z, the second term in the roduct is 0, and hence the whole sum is 0. For θ Z, θ /θ is not an integer, and hence, after evaluating the geometric sums, m θ n= e 2πin/θ = ( ( e 2πi θ m/θ 2πi/θ e2πi θ /θ e 2πi θ /θ e e 2πi/θ ( + f(θ, where f(θ = ( e 2πi/θ is a function of θ and is indeendent of m. Therefore, for all N N, we can use the division algorithm to write N = m θ + r where 0 r < θ and conclude that N m θ e 2πin/θ m θ +r e 2πin/θ + e 2πin/θ 2 f(θ + r < 2 f(θ + θ. n= n= n=m θ + Given any θ, therefore, the series n= e2πin/θ is bounded, as claimed. 6

62 References [Ah] Ahlfors, L.V. Comlex Analysis: An Introduction to the Theory of Analytic Functions of One Comlex Variable. York, PA: McGraw-Hill Book Comany, Inc., 953. [A] Aostol, Tom M. Introduction to Analytic Number Theory. New York, NY: Sringer-Verlag, 976. [C] Cohn, C. Advanced Number Theory. New York, NY: Dover Publications, 962. [D] Davenort, H. Multilicative Number Theory, Third Edition. New York, NY: Sringer-Verlag, [Da] Davidoff, G. Elementary Number Theory, Grou Theory, and Ramanujan Grahs. Cambridge, UK: Cambridge University Press, [G] Goldfeld, D., The elementary roof of the Prime Number Theorem: An historical ersective, (rerint. [L] LeVeque, W.J., Toics in Number Theory, Volume. Reading, MA: Addison-Wesley Publishing Comany, Inc., 956. [N] Newman, D.J., Simle analytic roof of the Prime Number Theorem, Amer. Math. Monthly, Vol. 87 (980, [Sa] Saff, E.B., and Snider, A.D., Fundamentals of Comlex Analysis with Alications to Engineering and Science, Third Edition. Uer Saddle River, NJ: Pearson Education, Inc., [So] Sorounov, I., A short roof of the Prime Number Theorem for arithmetic rogressions, (rerint. [Z] Zagier, D., Newman s short roof of the Prime Number Theorem, Amer. Math. Monthly, Vol. 04 (997, No. 8,