Introductory Number Theory Course No. 100 331 Sring 2006 Michael Stoll Contents 1. Very Basic Remarks 2 2. Divisibility 2 3. The Euclidean Algorithm 2 4. Prime Numbers and Unique Factorization 4 5. Congruences 5 6. Corime Integers and Multilicative Inverses 6 7. The Chinese Remainder Theorem 9 8. Fermat s and Euler s Theorems 10 9. Structure of F and (Z/ n Z) 12 10. The RSA Crytosystem 13 11. Discrete Logarithms 15 12. Quadratic Residues 17 13. Quadratic Recirocity 18 14. Another Proof of Quadratic Recirocity 23 15. Sums of Squares 24 16. Geometry of Numbers 27 17. Ternary Quadratic Forms 30 18. Legendre s Theorem 32 19. -adic Numbers 35 20. The Hilbert Norm Residue Symbol 40 21. Pell s Equation and Continued Fractions 43 22. Ellitic Curves 50 23. Primes in arithmetic rogressions 66 24. The Prime Number Theorem 75 References 80
2 1. Very Basic Remarks The following roerties of the integers Z are fundamental. (1) Z is an integral domain (i.e., a commutative ring such that ab = 0 imlies a = 0 or b = 0). (2) Z 0 is well-ordered: every nonemty set of nonnegative integers has a smallest element. (3) Z satisfies the Archimedean Princile: if n > 0, then for every m Z, there is k Z such that kn > m. 2. Divisibility 2.1. Definition. Let a, b be integers. We say that a divides b, written a b, if there is an integer c such that b = ac. In this case, we also say that a is a divisor of b or that b is a multile of a. We have the following simle roerties (for all a, b, c Z). (1) a a, 1 a, a 0. (2) If 0 a, then a = 0. (3) If a 1, then a = ±1. (4) If a b and b c, then a c. (5) If a b, then a bc. (6) If a b and a c, then a b ± c. (7) If a b and b < a, then b = 0. (8) If a b and b a, then a = ±b. 2.2. Definition. We say that d is the greatest common divisor of a and b, written d = gcd(a, b) or d = a b, if d a and d b, d 0, and for all integers k such that k a and k b, we have k d. We say that m is the least common multile of a and b, written m = lcm(a, b) or m = a b, if a m and b m, m 0, and for all integers n such that a n and b n, we have m n. In a similar way, we define the greatest common divisor and least common multile for any set S of integers. We have the following simle roerties. (1) gcd( ) = 0, lcm( ) = 1. (2) gcd(s 1 S 2 ) = gcd(gcd(s 1 ), gcd(s 2 )), lcm(s 1 S 2 ) = lcm(lcm(s 1 ), lcm(s 2 )). (3) gcd({a}) = lcm({a}) = a. (4) gcd(ac, bc) = c gcd(a, b). (5) gcd(a, b) = gcd(a, ka + b). 3. The Euclidean Algorithm How can we comute the gcd of two given integers? The key for this is the last roerty of the gcd listed above. In order to make use of it, we need the oeration of division with remainder.
3.1. Proosition. Given integers a and b with b 0, there exist unique integers q ( quotient ) and r ( remainder ) such that 0 r < b and a = bq + r. Proof. Existence: Consider S = {a kb : k Z, a kb 0}. Then S Z 0 is nonemty and therefore has a smallest element r = a qb for some q Z. We have r 0 by definition, and if r b, then r b would also be in S, hence r would not have been the smallest element. Uniqueness: Suose a = bq + r = bq + r with 0 r, r < b. Then b r r and 0 r r < b, therefore r = r. This imlies bq = bq, hence q = q (since b 0). 3.2. Algorithm GCD (Euclidean Algorithm). Given integers a and b, we do the following. (1) Set n = 0, a 0 = a, b 0 = b. (2) If b n = 0, return a n as the result. (3) Write a n = b n q n + r n with 0 r n < b n. (4) Set a n+1 = b n, b n+1 = r n. (5) Relace n by n + 1 and go to ste 2. We claim that the result returned is gcd(a, b). (Observe that 0 b n+1 < b n if the loo is continued, hence the algorithm must terminate.) Proof. We show that for all n that occur in the loo, we have gcd(a n, b n ) = gcd(a, b). The claim follows, since the return value a n = gcd(a n, 0) = gcd(a n, b n ) for the last n. For n = 0, we have gcd(a 0, b 0 ) = gcd( a, b ) = gcd(a, b). Now suose that we know gcd(a n, b n ) = gcd(a, b) and that b n 0 (so the loo is not terminated). Then gcd(a n+1, b n+1 ) = gcd(b n, a n b n q n ) = gcd(b n, a n ) = gcd(a n, b n ) (use roerty (5) of the gcd). 3.3. Theorem. Fix a, b Z. The integers of the form xa + yb with x, y Z are exactly the multiles of d = gcd(a, b). In articular, there are x, y Z such that d = xa + yb. Proof. Since d divides both a and b, d also has to divide xa+yb. So these numbers are multiles of d. For the converse, it suffices to show that d can be written as xa + yb. This follows by induction from the Euclidean Algorithm: Leet N be the last value of n. Then d = a N 1 + b N 0; and if d = x n+1 a n+1 + y n+1 b n+1, then we have d = y n+1 a n + (x n+1 q n y n+1 )b n, so setting x n = y n+1 and y n = x n+1 q n y n+1, we have d = x n a n + y n b n. So in the end, we must also have d = x 0 a 0 + y 0 b 0. There is a simle extension of the Euclidean Algorithm that also comutes numbers x and y such that gcd(a, b) = xa + yb. It looks like this. 3.4. Algorithm XGCD (Extended Euclidean Algorithm). Given integers a and b, we do the following. (1) Set n = 0, a 0 = a, b 0 = b, x 0 = sign(a), y 0 = 0, u 0 = 0, v 0 = sign(b). (2) If b n = 0, return (a n, x n, y n ) as the result. (3) Write a n = b n q n + r n with 0 r n < b n. (4) Set a n+1 = b n, b n+1 = r n, x n+1 = u n, y n+1 = v n, u n+1 = x n u n q n, v n+1 = y n v n q n. (5) Relace n by n + 1 and go to ste 2. 3
4 By induction, one shows that a n = x n a + y n b and b n = u n a + v n b, so uon exit, the return values (d, x, y) satisfy xa + yb = d = gcd(a, b). 3.5. Proosition. If n divides ab and gcd(n, a) = 1, then n divides b. Proof. By Thm. 3.3, there are x and y such that xa + yn = 1. We multily by b to obtain b = xab + ynb. Since n divides ab, n divides the right hand side and therefore also b. 4. Prime Numbers and Unique Factorization 4.1. Definition. A ositive integer is called a rime number (or simly a rime ), if > 1 and the only ositive divisors of are 1 and. (This is really the definition of an irreducible element in a ring!) 4.2. Proosition. If is a rime and a, b are integers such that ab, then a or b. (This is the general definition of a rime element in a ring!) Proof. We can assume that a (otherwise we are done). Then gcd(a, ) = 1 (since the only other ositive divisor of, namely itself, does not divide a). Then by Pro. 3.5, divides b. Now let n > 0 be a ositive integer. Then either n = 1, or n is a rime number, or else n has a roer divisor d such that 1 < d < n. Then we can write n = de where also 1 < e < n. Continuing this rocess with d and e, we finally obtain a reresentation of n as a roduct of rimes. (If n = 1, it is given by an emty roduct (a roduct without factors) of rimes.) 4.3. Theorem. This rime factorization of n is unique (u to ordering of the factors). Proof. By induction on n. For n = 1, this is clear (there is only one emty set... ). So assume that n > 1 and that we have written n = 1 2... k = q 1 q 2... q l with rime numbers i, q j, where k, l 1. Now 1 divides the roduct of the q j s, hence 1 has to divide one of the q j s. U to reordering, we can assume that 1 q 1. But the only ositive divisors of q 1 are 1 and q 1, so we must in fact have 1 = q 1. Let n = n/ 1 = n/q 1, then n = 2 3... k = q 2 q 3... q l. Since n < n, we know by induction that (erhas after reordering the q j s) k = l and j = q j for j = 2,..., k. This roves the claim.
4.4. Definition. This shows that we can write every nonzero integer n uniquely as n = ± v(n) where the roduct is over all rime numbers, and the exonents v (n) are nonnegative integers, all but finitely many of which are zero. v (n) is called the valuation of n at. We have the following simle roerties. (1) v (mn) = v (m) + v (n). (2) m n : v (m) v (n). (3) gcd(m, n) = min(v(m),v(n)), lcm(m, n) = max(v(m),v(n)). (4) v (m + n) min(v (m), v (n)), with equality if v (m) v (n). Proerty (3) imlies that gcd(m, n) lcm(m, n) = mn for ositive m, n. In general, we have gcd(m, n) lcm(m, n) = mn. If we set v (0) = +, then all the above roerties hold for all integers (with the usual conventions like min{e, } = e, e + =,... ). We can extend the valuation function from the integers to the rational numbers by setting ( r ) v = v (r) v (s) s (Exercise: check that this is well-defined). Proerties (1) and (4) above then hold for rational numbers, and a rational number x is an integer if and only if v (x) 0 for all rimes. 5 5. Congruences 5.1. Definition. Let a, b and n integers with n > 0. We say that a is congruent to b modulo n, written a b mod n, if n divides the difference a b. 5.2. Congruence is an equivalence relation. For fixed n and arbitrary a, b, c Z, we have: (1) a a mod n. (2) If a b mod n, then b a mod n. (3) If a b mod n and b c mod n, then a c mod n. Hence we can artition Z into congruence classes mod n : we let ā = a + nz = {a + nx : x Z} = {b Z : a b mod n} (in the ā notation, n must be clear from the context) and We then have Z/nZ = {ā : a Z}. a b mod n b ā ā = b.
6 5.3. Proosition. The ma {0, 1,..., n 1} Z/nZ, r r = r + nz is a bijection. In articular, Z/nZ has exactly n elements. Proof. The ma is clearly well-defined. It is injective: assume r = s with 0 r, s < n. Then r s mod n, so n r s and r s < n, therefore r = s. It is surjective: Let ā be a congruence class and write a = nq + r with 0 r < n. Then ā = r. Since the reresentative r of a class ā is given by the (least nonnegative) residue of a when divided by n, congruence classes are also called residue classes. 5.4. The congruence classes form a commutative ring. We define addition and multilication on Z/nZ: ā + b = a + b, ā b = ab We have to check that these oerations are well-defined. This means that if a a mod n and b b mod n, then we must have a + b a + b mod n and ab a b mod n. Now (a + b) (a + b ) = (a a ) + (b b ) is divisible by n, and also ab a b = (a a )b + a (b b ) is divisible by n. Once these oerations are well-defined, all the commutative ring axioms carry over immediately from Z to Z/nZ. 5.5. Congruences are useful. Why are congruences a useful concet? They give us a kind of Mickey Mouse image of the integers (luming together many integers into one residue class, thus losing information), with the advantage that the resulting structure Z/nZ has only finitely many elements. This means that all sorts of questions that are difficult to answer with resect to Z are effectively (though not necessarily efficiently, if n is large) decidable with resect to Z/nZ. If we can show in this way that something is imossible over Z/nZ, then this often imlies a negative answer for Z, too. Consider, for examle, the equation x 2 + y 2 15 z 2 = 7. Does it have a solution in integers? That to decide seems to be hard. On the other hand, we can very easily make a table of all ossible values of the left hand side in Z/8Z: it is easy to see that a square is always 0, 1, or 4 mod 8, and adding three of these values (note that 15 1 mod 8) leads to all residue classes mod 8 with one excetion the left hand side is never 7 mod 8. So a solution is not ossible in Z/8Z. But any solution in Z would lead to an image solution in Z/8Z, hence there can be no solution in Z either. 6. Corime Integers and Multilicative Inverses When does a class ā have a multilicative inverse in Z/nZ? We have to solve the congruence ax 1 mod n; equivalently, there need to exist integers x and y such that ax + ny = 1. By Thm. 3.3, this is equivalent with gcd(a, n) = 1. In this case, we say that a and n are relatively rime or a and n are corime, and sometimes write a n. We can use the extended Euclidean Algorithm to find the inverse.
7 6.1. Theorem. The ring Z/nZ is a field if and only if n is a rime number. Proof. Clear for n = 1 (a field has at least two elements 0 and 1, and 1 is not a rime number). For n > 1, Z/nZ is not a field if and only if there is some a Z, not divisible by n, such that d = gcd(n, a) > 1. This imlies that 1 < d < n is a roer divisor of n, hence n is not a rime. Conversely, if d is a roer divisor of n, then gcd(n, d) = d, and d is not invertible. If gcd(n, a) = 1, then ā is called a rimitive residue class mod n ; its uniquely determined multilicative inverse in Z/nZ is denoted ā 1. The rime residue classes form a grou, the multilicative grou (Z/nZ) of the ring Z/nZ. When n = is rime, then the field Z/Z is also denoted F ; we have F = F \{ 0} for the multilicative grou; in articular, #F = 1. 6.2. Definition. The Euler φ function is defined for n > 0 by φ(n) = #(Z/nZ) = #{a Z : 0 a < n, a n}. n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 φ(n) 1 1 2 2 4 2 6 4 6 4 10 4 12 6 8 8 16 6 18 8 We have that n is rime if and only if φ(n) = n 1. 6.3. Proosition. If is rime and e 1, then φ( e ) = e 1 ( 1). Proof. Clear for e = 1. For e > 1, observe that a e φ( e ) = #{a Z : 0 a < e, a} = e #{a Z : 0 a < e, a} = e e 1 = ( 1) e 1. a a, so 6.4. A Recurrence. Counting the numbers between 0 (inclusive) and n (exclusive) according to their gcd with n (which can be any (ositive) divisor d of n), we obtain ( n ) φ = φ(d) = n. d d n d n This can be read as a recurrence for φ(n): φ(n) = n We get d n,d<n φ(d). φ(1) = 1 0 = 1, φ(2) = 2 φ(1) = 1, φ(3) = 3 φ(1) = 2, φ(4) = 4 φ(2) φ(1) = 2, φ(6) = 6 φ(3) φ(2) φ(1) = 2, etc. Obviously, the recurrence determines φ(n) uniquely: If integers a n (n 1) satisfy a d = n, d n then a n = φ(n). This still holds if the n s are restricted to be divisors of some fixed integer N.
8 6.5. A Question. The following icture reresents in black the corime airs (m, n) with 0 m, n 100. The black squares aear to be fairly evenly distributed, so the following question should make sense. What is the robability that two random (ositive) integers are corime? Take a large ositive integer N and consider all airs (m, n) with 1 m, n N. Call f(n) the number of such airs with m n. Since m n for all m, n, where m = m/ gcd(m, n), n = n/ gcd(m, n), we can count the airs according to their gcd. We get N 2 = #{(m, n) : 1 m, n N} N = #{(m, n) : 1 m, n N, gcd(m, n) = g} = = g=1 N #{(m, n ) : 1 m, n N/g, m n } g=1 N f ( N/g ) g=1 The robability we are looking for is P = lim N P (N), where P (N) = f(n)/n 2. We get N N/g 2 1 = P ( N/g ). N 2 g=1 Observe that the terms in the sum are between 0 and 1/g 2, hence the sum is uniformly absolutely convergent. Passing to the limit as N, we obtain P 1 = g, 2 hence P = g=1 1 = 6 g 2 π 0.608. 2 g=1
9 (Exercise: where is the ga in this argument?) We can aly what we have learned to linear congruences. Given a, b, and n, what are the solutions x of ax b mod n? In other words, for which x Z does there exist a y Z such that ax + ny = b? 6.6. Theorem. The congruence ax b mod n has no solutions unless gcd(a, n) b. If there are solutions, they form a residue class modulo n/gcd(a, n). Proof. By Thm. 3.3, the condition gcd(a, n) b is necessary and sufficient for solutions to exist. Let g = gcd(a, n). If g b, we can divide the equation ax+ny = b by g to get a x + n y = b, where a = a g, n = n g, b = b g. Then gcd(a, n ) = 1, and we can solve the equation ā x = b for x (in Z/n Z): x = b (ā ) 1. So the set of solutions x is given by this residue class modulo n. 7. The Chinese Remainder Theorem Now let us consider simultaneous congruences: x a mod m, x b mod n Is such a system solvable? What does the solution set look like? If d = gcd(m, n), then we obviously need to have a b mod d for a solution to exist. On the other hand, when m n, solutions do always exist. 7.1. Chinese Remainder Theorem. If m n, then the above system has solutions x; they form a residue class modulo mn. Proof. Since m n, we can find u and v with mu+nv = 1 by Thm. 3.3. Consider x = anv + bmu. We have x = anv + bmu anv = a amu a mod m and similarly x b mod n. So solutions exist. Now we show that y is anoher solution if and only if mn y x. If mn divides y x, then m and n both divide y x, hence y x a mod m and y x b mod n. Now assume y is another solution. Then m and n both divide y x. So n y x = mt. By Pro. 3.5, n t and hence mn mt = y x. There is a straight-forward extension to more than two simultaneous congruences. 7.2. Chinese Remainder Theorem. If the numbers m 1, m 2,..., m k are corime in airs, then the system of congruences x a 1 mod m 1, x a 2 mod m 2,..., x a k mod m k has solutions; they form a residue class modulo m 1 m 2... m k. Proof. Induction on k using Thm. 7.1. Note that a b, a c imlies a bc. Now we can answer the question from the beginning of this section.
10 7.3. Theorem. The system x a mod m, x b mod n has solutions if and only if a b mod gcd(m, n). If solutions exist, they form a residue class modulo lcm(m, n). Proof. We have already seen that the condition is necessary. Let d = gcd(m, n). We can find u and v such that mu + nv = b a by Thm. 3.3. Then x = a + mu = b nv is a solution. As in the roof of Thm. 7.1, we see that y is another solution if and only if m and n both divide y x. This means that the solutions form a residue class modulo lcm(m, n). We can give the Chinese Remainder Theorem a more algebraic formulation. 7.4. Theorem. Assume that m 1, m 2,..., m k are corime in airs. Then the natural ring homomorhism Z/m 1 m 2... m k Z Z/m 1 Z Z/m 2 Z Z/m k Z is an isomorhism. In articular, we have an isomorhism of multilicative grous (Z/m 1 m 2... m k Z) = (Z/m1 Z) (Z/m 2 Z) (Z/m k Z). Proof. By the above, the homomorhism is bijective, hence an isomorhism. 7.5. A Formula for φ. The Chinese Remainder Theorem 7.4 imlies that φ(mn) = φ(m) φ(n) if m n. A similar formula holds for roducts with more factors. Alying this to the rime factorization of n, we get φ(n) = v(n) 1 ( 1) = n ( 1 1 ). n n 8. Fermat s and Euler s Theorems A very nice roerty of the finite fields F and all their extension fields is that the ma x x is not only comatible with multilication: (xy) = x y, but also with addition! 8.1. Theorem ( Freshman s Dream ). Let F be a field of rime characteristic (this means that 1 F = 0 F ; for us, the basic examle is F = F ). Then for all x, y F, we have (x + y) = x + y. Proof. By the Binomial Theorem, ( ) ( ) ( ) ( ) (x+y) = x + x 1 y+ x 2 y 2 + + x k y k + + xy 1 +y. 1 2 k 1 Now the binomial coefficients ( k) for 1 k 1 all are integers divisible by (why?), and since F is of characteristic, all the corresonding terms in the formula vanish, leaving only x + y. We can use this to give one roof of the following fundamental fact.
8.2. Theorem (Fermat s Little Theorem). Let be a rime number. For all ā F, we have ā = ā. (Equivalently, for all a Z, divides a a.) 11 Proof. First Proof: By induction on a. a = 0 is clear. Now, by Thm. 8.1, divides (a+1) a 1 for all a Z. But then also divides ((a+1) (a+1)) (a a), hence: a a (a + 1) (a + 1) This gives the inductive ste uwards and downwards, hence the claim holds for all a Z. Second Proof: Easy roof using Algebra. ā = 0 is clear. Hence it suffices to show that ā 1 = 1 for all ā 0. This is a consequence of the fact that #F = 1 and the general theorem that g #G = 1 for any g in any finite grou G. Third Proof: By Combinatorics (for a > 0). Consider utting beads that can have colors from a set of size a at equidistant laces around a circle (to form necklaces ). There will be a necklaces in total, a of which will consist of beads of only one color. The remaining a a come in bunches of, obtained by rotation, so has to divide a a. The algebra roof can readily be generalized. 8.3. Theorem (Euler). Let n be a ositive integer. Then for all a Z with a n, we have a φ(n) 1 mod n. Proof. Under the assumtion, ā (Z/nZ), and by definition, #(Z/nZ) = φ(n). By the general fact from algebra used in the second roof of Thm. 8.2, the claim follows. 8.4. Examle. What is 7 1113 mod 15? By Thm. 8.3, 7 8 1 mod 15 (as φ(15) = 8). On the other hand, 11 3 mod 8, and 3 4 1 mod 8 (in fact, already 3 2 is 1 mod 8). So 11 13 = (11 4 ) 3 11 11 3 mod 8, and then 7 1113 7 3 = 343 13 mod 15. 8.5. A Consequence of Fermat s Little Theorem. Consider the olynomial X X with coefficients in F. By Fermat s Little Theorem 8.2, every element ā F is a root of this olynomial. Now F is a field, and so we can divide out the roots successively to find that X X = ā F (X ā). This imlies that for any olynomial f(x) dividing X X (in the olynomial ring F [X]), the number of its distinct roots in F equals the degree deg f(x). More generally, if f is any olynomial in F [X], we can comute the number of distinct roots of f in F by the formula #{ā F : f(ā) = 0} = deg gcd(f, X X).
12 9. Structure of F and (Z/ n Z) Fermat s Theorem 8.2 tells us that the multilicative order of any nonzero element ā of F (this is the smallest ositive integer n such that ā n = 1) divides 1. (The set of all n such that ā n = 1 consists exactly of the multiles of the order.) Now the question arises, are there elements of order 1? In other, more algebraic terms, is the grou F cyclic? The answer is yes. 9.1. Theorem. The multilicative grou F is cyclic. (In other words, there exist elements ḡ F such that all ā F are owers of ḡ. The corresonding integers g are called rimitive roots mod.) Proof. Obviously, all elements of F of order dividing d (where d is a divisor of 1) will be roots of X d 1. Since d divides 1, X d 1 divides X 1 1 and hence also X X (as olyomials). By 8.5, it follows that X d 1 has exactly d roots in F. Let a d be the number of elements of exact order d. Then we get d = k d a k. By the statement in 6.4, it follows that a d = φ(d); in articular, a 1 = φ( 1) 1. Hence rimitive roots exist. 9.2. Examles. The roof shows that there are exactly φ( 1) essentially distinct rimitive roots mod. For the first few rimes, we get the following table. Prime Primitive Roots 2 1 3 2 5 2, 3 7 3, 5 11 2, 3, 8, 9 13 2, 6, 7, 11 There is a famous conjecture, named after Artin, that asserts that every integer g 1 that is not a square is a rimitive root mod infinitely many different rimes. (Why are squares no good?) This has been roven assuming another famous conjecture, the Extended Riemann Hyothesis. The best unconditional result so far seems to be that the statement is true for all allowed integers, with at most three excetions. On the other hand, the statement is not known to hold for any articular integer g! 9.3. Proosition. Let G be a finite multilicative abelian grou of order n. An element g G is a generator of G (and so G is cyclic) if and only if g n/q 1 G for all rime divisors q of n. Proof. If g is a generator, then n is the least ositive integer m such that g m = 1 G, hence the condition is necessary. Now if g is not a generator, then its order m divides n, but is smaller than n, hence m divides n/q for some rime divisor q of n. It follows that g n/q = 1 G. Let us use this result to show that (Z/ n Z) is cylic if is an odd rime and n 1.
9.4. Theorem. Let g be a rimitive root modulo, where is an odd rime. Then one of g and g + is a rimitive root modulo n for all n 1. Proof. We know that g 1 = 1 + a for some a Z. If a, let h = g; otherwise we set h = g + ; then we have h 1 = 1 + a with a : (g + ) 1 = g 1 + ( 1)g 2 + b 2 1 k mod 2 (with some integer b) where k g 2 mod is not divisible by. Hence we have in both cases that h 1 1 + a mod 2 with a. Now I claim that for all n 0, we have h n ( 1) 1 + a n+1 mod n+2. This follows by induction from the case n = 0: h n+1 ( 1) = ( h n ( 1) ) = (1 + (a + b) n+1 ) = 1 + (a + b) n+1 + c n+3 = 1 + a n+2 + (b + c) n+3 Here b and c are suitable integers, and the enultimate equality uses that 3 (since then the last term in the binomial exansion, (a + b) (n+1), is divisible by n+3, as are the intermediate ones, even when n = 0). Now let n 2, and let q be a rime divisor of φ( n ) = n 1 ( 1). If q divides 1, then h ( 1)/q 1 mod, hence also h n 1 ( 1)/q h ( 1)/q 1 mod, so h n 1 ( 1)/q 1 mod n. If q =, then we have just seen that h n 2 ( 1) 1 mod n. So by Pro. 9.3, h {g, g + } is a rimitive root mod n. 13 10. The RSA Crytosystem The basic idea of Public Key Crytograhy is that each articiant has two keys: A ublic key that is known to everybody and serves to encryt messages, and a rivate key that is known only to her or him and is used to decryt messages. For this idea to work, two conditions have to be satisfied: (1) Both encrytion and decrytion must be reasonably fast (with keys of a size satisfying the next condition) (2) It must be imossible to comute the rivate key from the ublic key in less than a very large amount of time (how large will deend on the desired level of security) Bob s Public Key Bob s Private Key Encrytion Algorithm Cihertext Decrytion Algorithm Alice Plaintext Eve Plaintext Bob
14 The first ublished system (1977) satisfying these assumtions was designed by Rivest, Shamir and Adleman, and is called the RSA Crytosystem (after their initials). However, already in 1973, Clifford Cocks at GCHQ (the British NSA equivalent) came u with the same system. It was not used by GCHQ, and Cocks contribution only ublicly acknowledged in 1997. The idea was that finding the rime factors of a large number is very hard, whereas knowing them would allow you to do ceratin things quickly. 10.1. The set-u. To generate a ublic-rivate key air, one takes two large rime numbers and q (of 160 or more decimal digits, say). The ublic key then consists of n = q and another ositive integer e that has to be corime with lcm( 1, q 1) and can be taken to be fairly (but not too) small (in order to make encrytion more efficient). Encrytion roceeds as follows. The message is encoded in one or several numbers 0 m < n (e.g., by taking bunches of bits of length less than the length of n (measured in bits)). Then each number m is encryted as c = m e mod n. In order to decryt such a c, we need to be able to undo the exonentiation by e. In order to do this, we use Fermat s Little Theorem 8.2 and the Chinese Remainder Theorem 7.1: Since e lcm( 1, q 1), we can comute d such that de 1 mod lcm( 1, q 1) (using the XGCD Algorithm). Then c d = m de m mod and modq by Fermat s Little Theorem and so c d m mod n by the Chinese Remainder Theorem. So the ublic key is the air (n, e) and the rivate key the air (n, d). Encrytion is m m e mod n, decrytion is c c d mod n. 10.2. Why is it ractical? Encrytion and decrytion are reasonably fast: they involve exonentiation mod n, which can be done in O((log n) 2 log e) time (where e is the exonent), or even in O(log n log log n log e), using fast multilication. Also, it is ossible to select suitable rimes and q in reasonable time: there are algorithms that rove that a given number is rime in olynomial time (olynomial in log ), and gas between rimes are on average of size log, so one can exect to find a rime in olynomial time. The remaining stes in choosing the ublicrivate key air are relatively fast. For examle, my lato running the comuter algebra system MAGMA, takes about 4 seconds to find a rime of 100 digits, and about 12 seconds to find a rime of 120 digits. 10.3. Why is it considered secure? In order to get m from c, one needs a number t such that c t m mod n. For general m, this means that te 1 mod lcm( 1, q 1). Then te 1 is a multile of lcm( 1, q 1), and we can use this in order to factor n in the following way. Note that if n is an odd rime, then there are exactly two square roots of 1 in the ring Z/nZ (which is a field of characteristic not 2 in this case), namely (the residue classes of) 1 and 1. However, when n = q is the roduct of two distinct odd rimes, then there are four such square roots; they are obtained from airs of square roots of 1 mod and mod q via the Chinese Remainder Theorem 7.1. If x 2 1 mod n, but x ±1 mod n, then we can use x to factor n: we have that n divides x 2 1 = (x 1)(x + 1), but n divides neither factor on the right, so gcd(x 1, n) has to be a roer divisor of n.
Now suose we know a multile f of lcm( 1, q 1). Write f = 2 r s with s odd (note that r 1 since 1 and q 1 are even). Now ick a random 1 < w < n 1. If gcd(w, n) 1, then we have found a roer divisor of n. Otherwise, we successively comute w 0 = w s mod n, w 1 = w 2 0 mod n, w 2 = w 2 1 mod n,..., w r = w 2 r 1 mod n By Fermat s Little Theorem 8.2 and the Chinese Remainder Theorem 7.1, w r = 1. Now if there is some j such that w j ±1 mod n, but w j+1 1 mod n, we have found a square root of 1 mod n that will slit n as exlained above. One can check (see [Sti, Sect. 5.7.2]) that the robability of success is at least 1/2. Hence we need no more than two tries on average to factor n. Conversely, if we have and q, we can easily comute a suitable t (in fact, our d in the rivate key is found that way). The ushot is that in order to break the system, we have to factor n. Now factorization aears to be a hard roblem: even though quite some effort has been invested into develoing good factoring algorithms (in articular since this is relevant for crytograhy! you can win rize money if you factor certain numbers), and we now have considerably better algorithms than thirty years ago (say), the erformance of the best known algorithms is still much worse than olynomial time. The comlexity is something like ( ex O( 3 ) log n(log log n) 2 ). This is already quite a bit better than exonential (in log n), but grows fast enough to make factorization of 300-digit numbers or so infeasible. For examle, again MAGMA on my lato needs 14 seconds to factor a roduct of two 20-digit rimes and 3 minutes to factor a roduct of two 30-digit rimes. But note that there is an efficient algorithm (at least in theory) for factoring integers on a quantum comuter. So if quantum comuters become a reality, crytosystems based on the difficulty of factorization like RSA will be dead. 11. Discrete Logarithms In RSA, we use modular exonentiation with a fixed exonent, where the base is the message. There are other crytosystems, which in some sense work the other way round: they use exonentiation with a fixed base and varying exonent. This can be done in the multilicative grou of a finite field F, or even in a more general setting. 11.1. The Discrete Logarithm Problem. Let G be a finite cyclic grou of order n, with generator g. The roblem of finding a Z/nZ from g and g a is known as the Discrete Logarithm Problem: We want to find the logarithm of g a to the base g. If x = g a, then sometimes the notation a = log g x is used. The difficulty of this roblem deends on the reresentation of the grou G. (1) The simlest case is G = Z/nZ (the additive grou), g = 1. Then log g x = x, and the roblem is trivially solved. (2) It is more interesting to choose G = F, with g a rimitive root mod (or rather, its image in F ). If the grou order #G = 1 has a large rime factor (e.g., 1 = 2q or 4q where q is rime), then here, the DLP 15
16 (short for Discrete Logarithm Problem) is considered to be hard. The best known algorithms have comlexity ( O ex(c 3 ) log q(log log q) 2 ) ; the situation is comarable to factorization. (3) Other grous one can use are the grous of F -rational oints on ellitic curves. Excet for certain secial cases, no secial-urose algorithms are known, and the best one can do is to use generic algorithms, which have exonential running time #G. This makes these grous attractive for crytograhy, since one gets secure systems with considerably shorter key-lengths. (4) When the grou order n = #G factors, then the Chinese Remainder Theorem can be used to simlify the roblem (if the factorization is known!). (This is the so-called Pohlig-Hellman attack.) 11.2. ElGamal Encrytion. Here is a general setting for a crytosystem based on DLP. It was originally suggested with G = F. In this case, it is advisable to take such that 1 is a small factor times a (large) rime q, in order to avoid the Pohlig-Hellman attack. Knowing the factorisation of 1 also hels in finding a rimitive root g, comare Pro. 9.3 (try random g until one is identified as a rimitive root). It works like this. Bob chooses a random number a Z/nZ, where n = #G is the grou order, and ublishes h = g a as his ublic key. (The grou G and generator g are fixed and also ublicly known.) The number a itself is his rivate key. This means that in order to find the rivate key from the ublic key, one has to solve a DLP. Now, when she wants to send Bob a message m G, Alice also chooses a random number k Z/nZ and then sends the air (g k, mh k ) to Bob: she masks the message by multilying it by h k (remember that h is Bob s ublic key), but leaves a clue for Bob by also sending g k. Now to decryt this, Bob takes the air (x, y) he receives and comutes m = x a y using his rivate key a. An eavesdroer interceting the cihertext would need to find h k = g ak from g k and h = g a in order to get the laintext. This is called the Diffie-Hellman Problem, because it also comes u in the secret key exchange rotocol develoed by Diffie and Hellman (see below). It is believed that the Diffie-Hellman Problem is no easier than the DLP (it is certainly not harder), but this has not been roved. 11.3. Diffie-Hellman Key Exchange. This is a method for two eole to agree on a secret key, communicating through an oen channel. It also works for general cylic grous G with fixed generator g (but was first suggested with G = F ). Our two rotagonists, Alice and Bob, both select a random number a (for Alice) and b (for Bob) in Z/nZ. Alice sends A = g a to Bob, and Bob sends B = g b to Alice. Then Alice comutes k = B a, and Bob comutes k = A b. Both get the same result g ab, from which they then can derive a key for a classical symmetric crytosystem. In order for the eavesdroer Eve to get at the key, she must be able to find g ab from the knowledge of g a and g b, which is exactly the Diffie-Hellman Problem.
17 12. Quadratic Residues 12.1. Definition. Let be an odd rime and a Z an integer not divisible by. Then a is called a quadratic residue mod if the congruence x 2 a mod has solutions. Otherwise, a is a quadratic nonresidue mod. 12.2. Examles. qu. res. qu. nonres. 3 1 2 5 1, 4 2, 3 7 1, 2, 4 3, 5, 6 11 1, 3, 4, 5, 9 2, 6, 7, 8, 10 Let g be a rimitive root mod ; then each a such that a is congruent to some g k mod (where k is uniquely determined modulo 1, in articular, since is odd, the arity of k is fixed). It is clear that x 2 a mod has a solution if and only if k is even. Whence: 12.3. Theorem. Let be an odd rime and a Z, a, and let g be a rimitive root mod. Then the following statements are equivalent. (1) a is a quadratic residue mod. (2) log g a is even. (3) a ( 1)/2 1 mod (Euler s criterion). Proof. We have already seen the equivalence of the first two statements. Now if a is a quadratic residue, then a x 2 mod for some x, hence a ( 1)/2 x 1 1 mod by Fermat s Little Theroem 8.2. On the other hand, if a ( 1)/2 1 mod, then, writing a g k, the logarithm k = log g a cannot be odd, since then a ( 1)/2 g k( 1)/2 1 mod, because k( 1)/2 is not divisible by 1. We see that the roduct of two quadratic residues is again a quadratic residue, whereas the roduct of a quadratic residue and a quadratic nonresidue is a quadratic nonresidue. Also, the roduct of two quadratic nonresidues is a quadratic residue. We also see that there are exactly ( 1)/2 quadratic residue classes and ( 1)/2 quadratic nonresidue classes mod. 12.4. Definition. To simlify notation, one introduces the Legendre Symbol: For an odd rime and a an integer, set ( ) a = ( By the definitions, we have 1 if a and a is a quadratic residue mod, 1 if a and a is a quadratic nonresidue mod, 0 if a. a ) ( = b ) if a b mod. We can combine this definition with Euler s criterion to obtain the following.
18 12.5. Proosition. Let be an odd rime, and let a Z. Then ( ) a a ( 1)/2 mod, and this congruence determines the value of the Legendre symbol. Proof. Since 3, the residue classes of 1, 0 and 1 mod are distinct, so the last statement follows. To rove the congruence, we consider the three ossible cases in the definition of the Legendre synbol. If a is a quadratic residue, then both sides are 1 by Thm. 12.3. If a is a quadratic nonresidue, then the left hand side is 1, whereas the right hand side is 1, but its square is 1. Since Z/Z is a field, the right hand side must be 1. Finally, if a, then both sides are 0. Note that this result tells us that we can determine efficiently whether a given integer a is a quadratic residue mod or not: the modular exonentiation a ( 1)/2 mod can be comuted in olynomial time. It is a different matter to actually find a square root of a mod if a is a quadratic residue mod. There are robabilistic olynomial time algorithms for that, but (as far as I know) no deterministic olynomial time algorithm is known that works for general. 12.6. Theorem. For an odd rime and integers a and b, ( ) ( ) ( ) ab a b =. Proof. We have ( a ) ( ) b a ( 1)/2 b ( 1)/2 = (ab) ( 1)/2 ( ) ab mod by Pro. 12.5. By the same roosition, the value of the Legendre symbol is determined by the congruence. The claim follows. 12.7. Examle. By the receding result, we can comute factors of a. So, if a = ±2 e q f 1 1 q f 2 2... q f k k ( ) a = ( ±1 ) ( 2 ) e ( q1 ( a ) with odd rimes q j, then ) f1 ( ) f2 ( ) fk q2 qk.... in terms of the 13. Quadratic Recirocity ( ) a By the receding examle, in order to be able to comute in general, we need ( ) ( ) ( ) 1 2 q to know and, and we need a way to find if q is another odd rime. The first is simle.
19 13.1. Theorem. If is an odd rime, then ( ) { 1 1 if 1 mod 4, = ( 1) ( 1)/2 = 1 if 3 mod 4. Proof. By Pro. 12.5, ( ) 1 ( 1) ( 1)/2 mod. Since both sides are ±1, equality follows. So the quadratic character of 1 mod deends on mod 4. Is there a similar result concerning the quadratic character of 2 mod? Here is a table. ( ) 3 5 7 11 13 17 19 23 29 31 2 + + + + ( ) ( ) It aears that = 1 if 1 or 7 mod 8 and = 1 if 3 or 5 mod 8. 2 In order to rove a statement like this, we need some other way of exressing the sign of the Legendre symbol. This is rovided by the following result due to Gauss. 13.2. Theorem. Let be an odd rime, and let S Z be a set of cardinality ( 1)/2 such that {0} S S is a comlete system of reresentatives for the residue classes mod. (Examles are S = {1, 2,..., ( 1)/2} and S = {2, 4, 6,..., 1}.) Then for all a such that a, we have ( ) a = ( 1) #{s S : as S}. Here S = { s : s S} is the set of residue classes mod reresented by elements of S. Proof. For all s S, there are unique t(s) S and ε(s) {±1} such that as ε(s)t(s) mod. We claim that s t(s) is a ermutation of S. But it is clear that this ma is surjective: Let s S and b an inverse of a mod, then there is s S such that ±s bs mod, so as ±s mod and therefore t(s ) = s. So the ma must be a bijection. Now, mod, we have (a ) s S s a ( 1)/2 s S = s S(as) s s S(ε(s)t(s)) = s S ε(s) s S s 2 = ( 1) #{s S : ε(s)= 1} s S s.
20 Since does not divide s S s, we get ( ) a ( 1) #{s S : ε(s)= 1} = ( 1) #{s S : as S} mod, and therefore equality (both sides are ±1). Taking a = 1 in the receding result immediately gives Thm. 13.1 again. ( ) We can now use this to rove our conjecture about the value of. 2 13.3. Theorem. If is an odd rime, then ( ) { 2 1 if ±1 mod 8, = ( 1) (2 1)/8 = 1 if ±3 mod 8. Proof. We use Thm. 13.2. For S, we take the standard set S = {1, 2, 3,..., 1 2 }. We have to count how many elements of S land outside S (mod ) when multilied by 2. If = 8k + 1, these ( ) elements are 2k + 1, 2k + 2,..., 4k; there are 2k of them, an even number, so = 1. 2 If = 8k + 3, these elements ( ) are 2k + 1, 2k + 2,..., 4k + 1; there are 2k + 1 of them, an odd number, so = 1. 2 f = 8k + 5, these ( ) elements are 2k + 2,..., 4k + 2; there are 2k + 1 of them, an odd number, so = 1. 2 If = 8k + 7, these elements ( ) are 2k + 2, 2k + 2,..., 4k + 3; there are 2k + 2 of them, an even number, so = 1. ( 13.4. Do we get similar results for Exerimental evidence suggests that ( ) { 3 1 if ±1 mod 12, = 1 if ±5 mod 12; ( ) { 5 1 if ±1 mod 5, = 1 if ±2 mod 5. 2 q ), where q is a fixed odd rime and varies? } { ( = 3) ( ) 3 } ( =. 5) if 1 mod 4, if 1 mod 4; For larger q, we get similar atterns: if q 1 mod 4, the result deends on mod q, if q 1 mod 4, the result deends on mod 4q. Both cases can be combined into the following statement.
13.5. Theorem (Law of Quadratic Recirocity). Let and q be distinct odd rimes. Then we have ( ) ( ) ( ) q = = ( 1) 1 q 1 2 2 q q ( ) if 1 mod 4 or q 1 mod 4 q = ( ) if 1 mod 4 and q 1 mod 4 q where = ( 1) ( 1)/2, so = if 1 mod 4 and = if 1 mod 4. Proof. We make again use of Gauss Lemma Thm. 13.2. We need two sets S = {1, 2,..., 1 q 1 } and T = {1, 2,..., 2 2 }. Let m = #{s S : qs S} (mod ) and n = #{t T : t T } (mod q). Then we have ( ) ( ) q = ( 1) m ( 1) n = ( 1) m+n. q We therefore have to find the arity of the sum m + n. Now, if qs s mod for some s S, then there is some t Z such that t qs = s S, i.e., 0 < t qs ( 1)/2. This number t now must be in T : t > qs > 0 and t 1 2 + qs (q + 1) 1 2 < q + 1 2 Since q is odd, the last inequality imlies t (q 1)/2. Hence we see that m = #{(s, t) S T : 0 < t qs 1 2 }. In exactly the same way, we have that n = #{(s, t) S T : q 1 2 t qs < 0}. Since there is no air (s, t) S T such that t = qs, it follows that m+n = #X, where X = {(s, t) S T : q 1 t qs 1 2 2 }. This set X is invariant under the rotation by π (or 180 ) about the center of the rectangle, which has the effect of changing (s, t) into (s, t ) = ( +1 s, q+1 t): 2 2 ( q + 1 ) ( + 1 ) t qs = t q s = q (t qs), 2 2 2 so t qs ( 1)/2 t qs (q 1)/2 and t qs (q 1)/2 t qs ( 1)/2. Since the only ossible fixed oint of the rotation is the center ( +1, q+1 ) of the rectangle, and since this oint belongs to X if it has integral 4 4 coordinates, we see that #X is odd + 1 4, q + 1 Z 1 mod 4 and q 1 mod 4. 4 This concludes the roof.. 21
22 13.6. Examle. With the hel of the Law of Quadratic Recirocity, we can evaluate Legendre symbols in the following way. ( ) ( ) ( ) ( ) ( ) ( ) ( ) 67 109 42 2 3 7 2 3 7 = = = = 109 67 67 67 67 67 67 ( ) ( ) ( ) ( ) 67 67 1 4 = ( 1)( )( ) = = 1 3 7 3 7 The disadvantage with this aroach is that we have to factor the numbers we get in intermediate stages, which can be very costly if the numbers are large. In order to overcome this difficulty, we generalize the Legendre Symbol and allow arbitrary odd integers instead of odd rimes. 13.7. Definition. Let a Z, and let n be an odd integer, with factorization n = ± e 1 1 e 2 2... e k k. Then we define the Jacobi Symbol via ( a k ( ) ej a =. n) j=1 It has the following simle roerties extending those of the Legendre Symbol. (1) ( a n) = 0 if and only if gcd(a, 1. (2) If a b mod n, then ( ( a n) = b n). (3) ( ) ( ab n = a ( b n) n). (4) ( a n) = 1 if a n and a is a square mod n. Warning. Contrary to the case of the Legendre symbol (i.e., when n is rime), the converse of the last statement does not hold in general. For examle, ( 2 15) = 1, but 2 is not a square mod 15 (since 2 is not a square mod 3 and mod 5). But, what is more imortant, the Jacobi symbol also obeys the Law of Quadratic Recirocity. 13.8. Theorem. Let m and n be ositive odd integers. We have ( ) 1 (1) = ( 1) n 1 2. n ( ) 2 (2) = ( 1) n2 1 8. n ( m ) ( (3) = ( 1) m 1 n 1 n 2 2. n m) Proof. This is roved by invoking the definition of the Jacobi Symbol and by observing that n ( 1) (n 1)/2 and n ( 1) (n2 1)/8 are multilicative on odd integers n, and (m, n) ( 1) (m 1)(n 1)/4 is bimultilicative on airs of odd integers. The results then reduce to Thms 13.1, 13.3 and 13.5, resectively. 13.9. Examle. Let us comute ( 67 109) again. ( ) ( ) ( ) ( ) ( ) ( ) ( ) 67 109 42 2 21 67 4 = = = = ( 1) = = 1 109 67 67 67 67 21 21 In general, using Jacobi Symbols in the intermediate stes, we can comute Legendre Symbnols (or, of course, Jacobi Symbols) much in the same way as we comute a GCD; we only have to take care to take out owers of 2 when they aear. j
23 14. Another Proof of Quadratic Recirocity Gauss found seven or eight different roofs of the law of quadratic recirocity in his life. Here is another one, which is more algebraic in flavor, and exlains why occurs in a natural way. 14.1. Definition. Let be an odd rime, and set ζ = ex(2πi/) C. For a Z rime to, we define the Gauss Sum 1 ( ) j g a = ζ aj Z[ζ]. j=1 14.2. Proosition. The Gauss Sum has the following roerties. (1) g a = (2) g 2 1 =. ( ) a g 1 for a n. (3) For an odd rime q, we have g q 1 g q mod q. (The congruence takes lace in the ring Z[ζ].) Proof. (1) We have ( ) 1 a g a = j=1 ( ) 1 aj ζ aj = k=1 ( ) k ζ k = g 1. (note that k = aj also runs through a comlete set of reresentatives of the rimitive residue classes mod.) (2) We comute g 2 1 = = 1 j,k=1 1 m=1 ( ) jk ζ j+k = ( m ) 1 1 j,m=1 ζ j(1+m) = j=1 ( ) jk ( ) m ζ j(1+m) ( ) 1 1 m=1 ( ) j 2 m ( ) m = ( ) m (where k = jm; note that = = 1 j=0 ζja = 0 if a and = otherwise, and that 1 m=1 (3) Mod q, we have g q 1 = ( 1 j=1 ( ) ) q 1 j ζ j j=1 ( ) q 1 j ζ jq = j=1. Also note that ( ) = 0.) m ( ) j ζ qj = g q.
24 14.3. Remark. By roerty (2) above, we have that g 1 = ± if 1 mod 4 and g 1 = ±i if 3 mod 4. It is then a natural question to ask which of the two signs is the correct one. Gauss was working on this question for quite a long time, until he finally was able to rove that the sign is always +. (Also for this statement, he found several different roofs in his life.) If 3 mod 4, this has for examle the following consequence. It is not hard to see that in this case 1 ( ) a S() = a = h a=1 with some integer h. The fact that g 1 = +i then imlies that h is ositive. This can be interreted as saying that quadratic nonresidues mod (between 1 and 1) are larger on average than quadratic residues. (If > 3, then h is the class number of ositive definite binary quadratic forms of discriminant, which is known to be ositive, since it counts something. On the other hand, what one really roves is that h = ig 1 1 a=1 a ( ) a, which imlies that S() = h, and so the sign of the Gauss sum determines the sign of S().) 14.4. Proof of the Quqdratic ( ) Recirocity Law. q On the one hand, g q = g 1. On the other hand, mod q, we have ( ) g q g q 1 = g 1 (g1) 2 (q 1)/2 = g 1 ( ) (q 1)/2 g 1 (by Euler s criterion). Taking both together, we see that ( ) ( ) q g 1 g 1. q Now we multily by g 1 and use that g1 2 = is rime to q, so that we can cancel it from both sides. This gives (q ) ( ) mod q q and then equality. 15. Sums of Squares In this section we address the question which ositive integers can be written as a sum of two, three, four,... squares. Let us first look at sums of two squares. Let S = {x 2 + y 2 : x, y Z} = {0, 1, 2, 4, 5, 8, 9, 10, 13, 16, 17, 18, 20, 25, 26, 29, 32, 34, 36, 37, 40,... }. It is clear that every square is in S. Also, it is easy to see that if n 3 mod 4, then n / S (recall that a square is either 0 or 1 mod 4). Furthermore, we have the following. q
25 15.1. Lemma. The set S is closed under multilication: if m, n S, then mn S. Proof. Note that (x 2 + y 2 )(u 2 + v 2 ) = (xu yv) 2 + (xv ± yu) 2. What is behind this formula is the following. x + iy 2 = x 2 + y 2 and αβ 2 = α 2 β 2. Because of the multilicative structure of S, it makes sense to look at the set of rime numbers that are in S. It is clear that / S if 3 mod 4. Obviously, 2 S, and from the list of the first few elements of S, it aears that S if 1 mod 4. 15.2. Theorem. If 1 mod 4 is a rime number, then S. Proof. We know that 1 is a square mod, hence there are a Z, k 1 such that a 2 + 1 = k. We can take a ( 1)/2, hence we can assume that k < /4. Now let k 1 be minimal such that there are x, y Z with x 2 + y 2 = k. We want to show that k = 1. So assume k > 1. Let u x mod k, v y mod k with u, v k/2. Then u 2 + v 2 = kk with 1 k k/2. (Note that k 0 because k, as 1 < k <.) Now xu + yv x 2 + y 2 0 mod k, xv yu xy yx = 0 mod k and (xu + yv) 2 + (xv yu) 2 = (x 2 + y 2 )(u 2 + v 2 ) = k 2 k. If we let x = xu + yv k, y = xv yu k then (x ) 2 +(y ) 2 = k and k < k, contradicting our choice of k. So we must have had k = 1. The technique of roof use here is called descent and goes back to Fermat. The name comes from the fact that we descend from one value of k to a smaller one. By what we know so far, we have already roved one direction of the following result characterizing the elements of S., 15.3. Theorem. A ositive integer n can be reresented as a sum of two squares if and only if for every rime n with 3 mod 4, the exonent with which aears in the factorization of n is even. Proof. If n is of the secified form, then n = 1 r m 2 with rimes j = 2 or j 1 mod 4. Since by the above, all factors in this roduct are in S and S is closed under multilication, n S. Now assume that n S and that we already know that all m S with m < n are of the secified form. Let 3 mod 4 be a rime number dividing n. Write n = x 2 + y 2. We claim that divides both x and y. It then follows that n = 2 m with m = (x/) 2 + (y/) 2 S, so we are done by induction.
26 To show that divides x and y, assume that (for examle), does not divide x. Then there is a Z with ax 1 mod, and we get 0 a 2 n = (ax) 2 + (ay) 2 1 + (ay) 2 mod, contradicting the fact that 1 is not a square mod. So must divide x and y. For three squares, the criterion is simler (but we will not rove it). 15.4. Theorem. A ositive integer can be reresented as a sum of three squares if and only if it is not of the form 4 k m where m 7 mod 8. It is easy to see that a number n = 4 k m with m 7 mod 8 is not a sum of three squares. First note that if a sum x 2 + y 2 + z 2 is divisible by 4, then x, y, z have to be even. This imlies that 4n is a sum of three squares if and only if n is. So we can assume that k = 0. Finally, mod 8, a square is 0, 1 or 4, so a sum of three squares can never be 7 mod 8. The hard art of the roof is to show that every n not of the given form actually is a sum of three squares. Part of the difficulty comes from the fact that the set of sums of three squares is not closed under multilication: 3 and 5 are sums of three squares, but 15 is not. 15.5. Four Squares. It might therefore seem rather hoeless to look for an identity for four squares analogous to (xu yv) 2 + (xv ± yu) 2 = (x 2 + y 2 )(u 2 + v 2 ), but in fact there is a good reason why one exists. The quaternion algebra, a 4- dimensional R-algebra, is a beautiful analog of the 2-dimensional algebra C; it was discovered by Hamilton. It is defined to be H := {a + ib + cj + dk : a, b, c, d R}, with the noncommutative multilication rules i 2 = j 2 = k 2 = 1, ij = k, ji = k, jk = i, kj = i, ki = j, ik = j. One can then define a norm ma N(a + ib + cj + dk) := (a + ib + cj + dk)(a ib cj dk) = a 2 + b 2 + c 2 + d 2, and it is easy to check that the norm is multilicative. When one writes out what this means, one discovers the identity (a 2 + b 2 + c 2 + d 2 )(A 2 + B 2 + C 2 + D 2 ) = (aa bb cc dd) 2 + (ab + ba + cd dc) 2 + (ac + ca bd + db) 2 + (ad + da + bc cb) 2. In light of this, the set of integers reresentable by four squares must be closed under multilication. In fact:
27 15.6. Theorem (Lagrange). All ositive integers are sums of four squares. Proof. By the identity stated above, it suffices to show that all rimes are sums of four squares. We do this by descent, imitating the roof of the Two Squares Theorem. First note that, alying Lemma 15.7 below, we can find integers a, b, c, d and k such that a 2 + b 2 + c 2 + d 2 = k and 1 k < (taking d = 0, say). If k = 1 we are done. Otherwise, let A, B, C, D be the integers determined by A a mod k, A k/2 B b mod k, B k/2 C c mod k, C k/2 D d mod k, D k/2 Thus A 2 + B 2 + C 2 + D 2 k 2. If equality holds, A, B, C and D must each equal k/2 or k/2. In that case a, b, c and d are each congruent to k/2 modulo k, which means k 2 divides a 2 +b 2 +c 2 +d 2 = k. But that is not ossible because 1 < k < and is rime. Hence A 2 + B 2 + C 2 + D 2 = kk with 1 k < k. Alying the magic identity, we have k 2 k = (a 2 + b 2 + c 2 + d 2 )(A 2 + B 2 + C 2 + D 2 ) = (aa bb cc dd) 2 + (ab + ba + cd dc) 2 + (ac + ca bd + db) 2 + (ad + da + bc cb) 2. Consider the right hand side: the latter three terms, and hence all four terms, are divisible by k 2. Dividing both sides by k 2, we obtain a reresentation of k as a sum of four squares, which comletes one ste of the descent. As already noted, at each ste we have 1 k < k. So, after a finite number of stes of the descent we must obtain k = 1. This comletes the roof. 15.7. Lemma. Let be an odd rime. Then there are integers u, v such that u 2 + v 2 + 1 0 mod. Proof. The statement is equivalent to the following: there are ū, v F such that ū 2 = v 2 1. Now let X = {ū 2 : ū F } and Y = { v 2 1 : v F }, then #X = #Y = ( + 1)/2. Since #X + #Y = + 1 > = #F, X and Y cannot be disjoint, which roves the claim. 16. Geometry of Numbers In this section, we will learn about a nice method to solve number theoretical roblems using geometry. The main result was discovered by Hermann Minkowski. The basic idea is that if we have a sufficiently nice and sufficiently large set in R n, then it will contain a non-zero oint with integral coordinates. For the alications, it is convenient to use more general lattices than the integral oints, so we have to introduce this notion first.
28 16.1. Definition. A lattice Λ R n is the set of all integral linear combinations of a set of basis vectors v 1,..., v n of R n. In articular, Λ is a subgrou of the additive grou R n. The set { n } F = t j v j : 0 t j < 1 for all j j=1 is called a fundamental arallelotoe for Λ. (Λ) = vol(f ) = det(v 1,..., v n ) is the covolume of Λ. The most imortant roerty of F is that every vector v R n can be written uniquely as v = λ + w with λ Λ and w F. In other words, R n is the disjoint union of all translates F + λ of F by elements of Λ. 16.2. Examle. The standard examle of a lattice is Λ = Z n R n, which is generated by the standard basis e 1,..., e n of R n and has covolume (Z n ) = 1. In some sense, this is the only examle: if Λ = Zv 1 + + Zv n R n is any lattice, then Λ is the image of Z n under the invertible linear ma T : R n R n that sends e j to v j. The covolume (Λ) is then det(t ). 16.3. Proosition. Let Λ R n be a lattice, and let Λ Λ be a subgrou of finite index m. Then Λ is also a lattice, and (Λ ) = m (Λ). Proof. As an abstract abelian grou, Λ = Z n. By the structure theorem for finitely generated abelian grous, there is an isomorhism φ : Λ Z n that sends Λ to a 1 Z a n Z with nonnegative integers a 1,..., a n. Since the index m of Λ in Λ is finite, we have a 1 a n = m. Let v 1,..., v n be the generators of Λ that are sent to the standard basis of Z n under φ. Then Λ = Za 1 v 1 + + Za n v n R n, so Λ is a lattice. Furthermore, (Λ ) = det(a 1 v 1,..., a n v n ) = a 1 a n det(v 1,..., v n ) = m (Λ). 16.4. Corollary. Let φ : Z n M be a grou homomorhism onto a finite grou M. Then the kernel of φ is a lattice Λ, and (Λ) = #M. Proof. By the standard isomorhism theorem, we have Z n / ker φ = M, hence Λ = ker φ is a subgrou of the lattice Z n of finite index #M. The claim follows from Pro. 16.3 and (Z n ) = 1. Now we are ready to state and rove Minkowski s Theorem. 16.5. Theorem (Minkowski). Let Λ R n be a lattice, and let S R n be a symmetric (i.e., S = S) and convex subset such that vol(s) > 2 n (Λ). Then S contains a nonzero lattice oint from Λ. Proof. In a first ste, we show that X = 1 S has to intersect one of its translates 2 under elements of Λ. Let F be a fundamental arallelotoe for Λ, and for λ Λ, set X λ = F (X + λ). By the fundamental roerty of F, we get that X = λ Λ(X λ λ)
(i.e., X is a disjoint union of translates of the X λ ). Hence vol(x λ ) = vol(x) = 2 n vol(s) > (Λ) = vol(f ), λ and so the sets X λ cannot be all disjoint (because they then would not fit into F ). So there are λ µ such that X λ X µ. Shifting by µ, we see that X (X + λ µ). Let x be a oint in the intersection. Then 2x S and 2x 2(λ µ) S. Since S is symmetric, we also have 2(λ µ) 2x S. Then, since S is also convex, the midoint of the linesegment joining 2x and 2(λ µ) 2x S must also be in S. But this midoint is λ µ Λ \ {0}, and the statement is roved. Let us use this result to re-rove the essential results on sums of two and four squares. 16.6. Theorem. Let 1 mod 4 be a rime. Then is a sum of two squares. Proof. We need a lattice Λ and a set S. Let u be a square root of 1 mod, and set Λ = {(x, y) Z 2 : x uy mod }. Then Λ is a lattice in R 2, and (Λ) = (we can think of Λ as the kernel of the comosition Z 2 F 2 F2 (ū, 1) which is a surjective grou homomorhism onto a grou of order ). For the set S, we take the oen disk S = {(ξ, η) R 2 : ξ 2 + η 2 < 2}. Then vol(s) = π 2 = 2π > 4 = 2 2 (Λ), and so by Thm. 16.5, there is some nonzero (x, y) Λ S. Now for each (x, y) Λ, we have that x 2 + y 2 (uy) 2 + y 2 = y 2 (1 + u 2 ) 0 mod. So divides x 2 + y 2 ; on the other hand, 0 < x 2 + y 2 < 2 by the definition of S. So we must have x 2 + y 2 =. Now let us consider the case of four squares. From Lemma 15.7, we know that for every odd rime, there are integers u and v such that divides 1 + u 2 + v 2. 16.7. Theorem. Let be an odd rime. Then is a sum of four squares. Proof. We need again a lattice Λ and a set S. For S, we should obviously take a suitable oen ball: S = {(ξ 1, ξ 2, ξ 3, ξ 4 ) R 4 : ξ 2 1 + ξ 2 2 + ξ 2 3 + ξ 2 4 < 2}. What is the volume of S? Here it is useful to know the general formula for the volume of the n-dimensional unit ball; it is vol(b n ) = πn/2 ( n 2 )! (where for odd n, the factorial satisfies the usual recurrence (x + 1)! = x! (x + 1), and one has ( 1/2)! = π). For n = 4, we get π 2 /2 for the volume of the unit ball, hence vol(s) = π 2 (2) 2 /2 = 2π 2 2. 29
30 From this, we can already see that the lattice should have covolume 2. This means that we need a 2-dimensional subsace of F 4 on which x 2 1 + x 2 2 + x 2 3 + x 2 4 vanishes. One such subsace is given by V = (1, ū, v, 0), (0, v, ū, 1) : if (ā, āū b v, ā v + bū, b) is a general element of V, then ā 2 + (āū b v) 2 + (ā v + bū) 2 + b 2 = ā 2 (1 + ū 2 + v 2 ) + b 2 ( v 2 + ū 2 + 1) + 2ā b( ū v + vū) = 0. If Λ is the kernel of Z 4 F 4 F 4 /V, then for (x 1, x 2, x 3, x 4 ) Λ, we have ( x 1, x 2, x 3, x 4 ) V, hence divides x 2 1 + x 2 2 + x 2 3 + x 2 4. For the covolume, we have (Λ) = #(F 4 /V ) = 2. Since vol(s) = 2π 2 > 16 2, the roof can be concluded in the same way as before. 17. Ternary Quadratic Forms In the receding sections, we have seen some quadratic forms. 17.1. Definition. An n-ary quadratic form is a homogenous olynomial of degree 2 in n variables (here, the coefficients will always be integers, but one can consider quadratic forms over any ring). For n = 2, we have binary quadratic forms; they have the general form Q(x, y) = a x 2 + b xy + c y 2. For n = 3, we have ternary quadratic forms and so on. Q(x, y, z) = a x 2 + b y 2 + c z 2 + d xy + e yz + f zx, So far, we have been asking about reresentations of numbers by a quadratic form Q, i.e., whether it is ossible to find a given integer as the value of Q at some tule of integers. Another question one can ask is whether a given quadratic form has a nontrivial zero, i.e., whether there exist (in the case of ternary forms, say) integers x, y, z, not all zero, such that Q(x, y, z) = 0. This is what we will look into now. For binary forms, this question is not very interesting; it boils down to deciding whether or not the form is the roduct of two linear forms with integral coefficients, which is the case if and only if the discriminant b 2 4ac of the form is a square. For ternary forms, however, this is an interesting roblem. Note that we can always assume that solutions are rimitive (i.e., gcd(x, y, z) = 1): common divisors can always be divided out. 17.2. Definition. Let Q be a quadratic form in n variables; then it can be given by a symmetric matrix M Q whose off-diagonal entries can be half-integers, such that Q(x) = x M Q x. Then det Q = det(m Q ) is called the determinant of Q, and disc Q = ( 1) n 1 2 2 n/2 det Q is called the discriminant of Q; the discriminant is always an integer. (The reason for the ower of 2 aearing in the definition of disc(q) is that the discriminant then also makes sense in characteristic 2.) For examle, disc(a x 2 + b xy + c y 2 ) = b 2 4 ac
31 and disc(a x 2 + b y 2 + c z 2 + d xy + e yz + f zx) = 4 abc + def ae 2 bf 2 cd 2. A quadratic form Q is non-degenerate if disc Q 0, otherwise it is called degenerate or singular. In this latter case, there is a linear transformation of the variables that results in a quadratic form involving fewer variables (choose an element in the kernel of M Q as one of the new basis vectors... ). 17.3. Some Geometry. Ternary quadratic forms corresond to conic sections in the lane. If we are looking for solutions to Q(x, y, z) = 0 in real numbers such that z 0 (say), we can divide by z 2 and set ξ = x/z, η = y/z to obtain Q(ξ, η, 1) = 0, the equation of a conic section in R 2. (If we want to include the solutions with z = 0, we have to consider the conic section in the rojective lane.) In this setting, nontrivial rimitive integral solutions to Q(x, y, z) = 0 corresond to rational oints (oints with rational coordinates) on the conic. This corresondence is two-to-one: to the oint (x/z, y/z) (in lowest terms) there corresond the two solutions (x, y, z) and ( x, y, z). For examle, if we take Q(x, y, z) = x 2 + y 2 z 2, then it corresonds to the unit circle in the xy lane, and the solutions (in this case, Pythagorean Triles) corresond to the rational oints on the unit circle (there are no solutions with z = 0). In fact, it is easy to describe them all: fix one oint, say ( 1, 0), and draw a line with rational sloe t = u/v through it. It will intersect the circle in another oint, which will again have rational coordinates. Conversely, if we take some rational oint on the circle, the line connecting it to ( 1, 0) will have rational sloe. We see that the rational oints are arametrized by the rational sloes (including = 1/0 for the vertical tangent at ( 1, 0); this line gives ( 1, 0) itself). The same kind of argument can be used quite generally. 17.4. Theorem. Let Q(x, y, z) be a non-degenerate ternary quadratic form that has a rimitive integral solution (x 0, y 0, z 0 ). Then there are binary quadratic forms R x, R y and R z such that, u to scaling, all integral solutions of Q(x, y, z) = 0 are given by (R x (u, v), R y (u, v), R z (u, v)) with integers u, v. Proof. We first assume that Q = y 2 xz. Then we can clearly take R x (u, v) = u 2, R y (u, v) = uv, R z (u, v) = v 2. (Dividing by z 2, we have (y/z) 2 = x/z; ut y/z = u/v and clear denominators.) Now assume that (x 0, y 0, z 0 ) = (1, 0, 0). Then If we set Q(x, y, z) = b y 2 + c z 2 + d xy + e yz + f zx. x = b X + e Y + c Z, y = d X f Y, z = d Y f Z, then Q(x, y, z) = disc(q)(y 2 XZ), as is easily checked. By the first case (note that disc(q) 0), this means that R x (u, v) = b u 2 + e uv + c v 2, R y (u, v) = d u 2 f uv, R z (u, v) = d uv f v 2 do what we want.
32 Finally, we consider the general case. By Pro. 18.6 in the Introductory Algebra notes (Fall 2005), there is a matrix T GL 3 (Z) such that (x 0 y 0 z 0 ) = (1 0 0)T. Write (x y z) = (x y z )T and set Q (x, y, z ) = Q(x, y, z); then Q (1, 0, 0) = Q(x 0, y 0, z 0 ) = 0. By the revious case, we have binary quadratic forms R x, R y, R z that arametrize the solutions of Q. Then (R x R y R z ) = (R x R y R z)t are the binary quadratic forms we want for Q. 17.5. Examle. For Q(x, y, z) = x 2 + y 2 z 2 and the initial solution ( 1, 0, 1), we can choose T = 1 0 1 0 1 0 0 0 1 and obtain x = x, y = y, z = x + z, so Q (x, y, z ) = Q( x, y, x + z ) = (y ) 2 (z ) 2 2x z. The quadratic forms arametrizing the solutions of Q are R x(u, v) = u 2 v 2, R y(u, v) = 2 uv, R z(u, v) = 2 v 2. For our original form Q, we then get R x (u, v) = R x(u, v) = v 2 u 2 R y (u, v) = R y(u, v) = 2 uv R z (u, v) = R x(u, v) + R z(u, v) = u 2 + v 2 This is exactly the well-known arametrization of the Pythagorean Triles. We see that we can easily find all solutions if we know just one. So there are two questions that remain: to decide whether a solution exist, and, if so, find one. 18. Legendre s Theorem We can always diagonalize a non-degenerate quadratic form by a suitable linear substitution of the variables (and erhas scaling, to kee the coefficients integral). Basically, this comes down to reeatedly comleting the square. So, for theoretical uroses at least, we can assume that our ternary quadratic form is diagonal: Q(x, y, z) = a x 2 + b y 2 + c z 2. In ractice, it might be a very bad idea to do this, as the coefficients a, b, c may be much larger than the coefficients of the original form! Let us be a bit more formal.
18.1. Definition. Let Q, Q be two ternary quadratic forms. We say that Q and Q are equivalent if Q (x, y, z) = λ Q(a 11 x + a 12 y + a 13 z, a 21 x + a 22 y + a 23 z, a 31 x + a 32 y + a 33 z) with λ Q and a matrix T = a 11 a 12 a 13 a 21 a 22 a 23 GL 3 (Q). a 31 a 32 a 33 The above then means that every non-degenerate ternary quadratic form is equivalent to a diagonal one. It is easily seen that Q has nontrivial integral (or equivalently, rational) solutions if and only if Q does. If we want to decide whether Q = ax 2 +by 2 +cz 2 admits a solution, we can simlify the roblem somewhat. We can, of course, assume that gcd(a, b, c) = 1. If a (say) is divisible by a square d 2, then we can as well move d 2 into the x 2 term and thus obtain an equivalent form with smaller coefficients. Proceeding in this way, we can assume that a, b and c are squarefree. Also, if two of the coefficients, say b and c, have a common rime divisor, then must divide x. We relace x by x and then divide the form by, making the coefficients smaller. In this way, we can also assume that a, b and c are corime in airs. We can summarize these assumtions by saying that abc is squarefree. 18.2. Necessary Conditions for Solubility. We can easily write down a number of conditions that are necessary for the existence of a solution: (1) Not all of a, b, and c have the same sign. (2) If abc is odd, then a, b and c are not equal mod 4. (3) If a is even (say), then b + c 0 or a + b + c 0 mod 8. (4) If a is odd, then bc is a quadratic residue mod. (5) If b is odd, then ca is a quadratic residue mod. (6) If c is odd, then ab is a quadratic residue mod. For odd rimes such that abc, we do not obtain any restrictions in this way: there are always nontrivial solutions mod (comare Lemma 15.7; the roof is more or less the same). Note that in order to check the conditions, we have to factor the coefficients a, b and c. It can be shown that this cannot be avoided: if one can find solutions to (diagonal) ternary quadratic forms, then one can also factor integers, hence solving ternary quadratic forms is at least as hard as factoring integers. The surrising fact is that these necessary conditions are already sufficient! 18.3. Theorem (Legendre). Let Q(x, y, z) = a x 2 +b y 2 +c z 2 with abc squarefree satisfy the conditions in 18.2. Then there exists a nontrivial solution in integers. Proof. We will rove this using Minkowski s Theorem 16.5. Let D = abc. Our first claim is that there is a lattice Λ Z 3 such that for all (x, y, z) Λ, 2D divides Q(x, y, z), and such that (Λ) = 2D. In order to find such a Λ, we construct lattices Λ for all odd D such that Q(x, y, z) when (x, y, z) Λ and such that (Λ ) =. We will also construct a lattice Λ 2 such that 2 or 4 divides Q(x, y, z) for all (x, y, z) Λ 2 (according to whether abc is odd or even) and such that (Λ 2 ) = 2 or 4. Then Λ = D Λ will do what we want. 33
34 Now let be an odd rime divisor of a (similarly for b or c). By assumtion, there exists some u Z such that divides bu 2 + c. Let Λ = {(x, y, z) Z 3 : y u z mod }. It is easily checked that Λ does what we want. Now assume that abc is odd. Then we let Λ 2 = {(x, y, z) Z 3 : x + y + z 0 mod 2}. If a (say) is even and b + c 0 mod 4, we let if b + c 2 mod 4, we let Λ 2 = {(x, y, z) Z 3 : x y + z 0 mod 2} ; Λ 2 = {(x, y, z) Z 3 : x y z mod 2}. It is again easily checked that Λ 2 has the required roerties in each case. Now assume that the sign of c differs from that of a and b. Then we take for S the ellitic cylinder We find that vol(s) = π S = {(ξ, η, ζ) R 3 : a ξ 2 + b η 2 < 2D and c ζ 2 < 2D}. 2D 2D 2 = 4 2πD D = 4 2πD > 16D = 8 (Λ). ab c D Hence by Thm. 16.5, there is a nonzero element (x, y, z) in Λ S. Since it is in Λ, Q(x, y, z) is a multile of 2D. Since Q(x, y, z) = ( a x 2 + b y 2 ) c z 2 and both terms in the difference are < 2D, we find that Q(x, y, z) < 2D. Together, these imly that Q(x, y, z) = 0. Note that the ellisoid given by a ξ 2 + b η 2 + c ζ 2 < 2D would be too small for the roof to work. Note also that we did not need to assume that solutions mod 4 or mod 8 exist. This is a general feature: one can always leave out one lace in the conditions either conditions mod owers of 2, or some odd rime, or the infinite lace, which here gives rise to the condition on the signs. The resaon behind this is essentially quadratic recirocity, which leads to the fact that the number of laces where the conditions fail is always even. In the above roof, one could use the mod 4/mod 8 conditions to come u with a lattice of covolume 4D giving divisibility by 4D; then the ellisoid would be sufficiently large, and we need not require the sign condition on the coefficients! There is also a roof by descent (in fact, that was how Legendre originally roved his theorem). 18.4. Corollary. If a x 2 + b y 2 + c z 2 = 0 has a nontrivial solution in integers, then it has one such that or equivalently, max{ a x 2, b y 2, c z 2 } 4π 2/3 abc < 1.865 abc, x 2π 1/3 bc, y 2π 1/3 ca, z 2π 1/3 ab.
Proof. With 2 abc instead of 4π 2/3 abc, this follows from the receding roof. Now note that this roof will still work if in the definition of S, we relace 2D by αd with α > 4π 2/3. Since S contains only finitely many lattice oints, there is one solution such that max{ a x 2, b y 2, c z 2 } < α abc for all α > 4π 2/3, which imlies the claim. In fact, more is true. 18.5. Theorem (Holzer). If a x 2 + b y 2 + c z 2 = 0 (with abc squarefree) has a nontrivial solution in integers, then it has one such that or equivalently, max{ a x 2, b y 2, c z 2 } abc, x bc, y ca, z ab. 35 To get this (when a, b > 0 and c < 0, say), one assumes that a given solution has z > ab and constructs a new one from this with smaller z. So the solution with smallest z must have z ab; the bounds on x and y then follow. 19. -adic Numbers 19.1. Motivation. In many circumstances, one wants to consider statements for all owers of some rime number. For examle, if a olynomial equation has (nontrivial) integral solutions, it necessarily has (nontrivial) solutions modulo all owers of. We also considered (nontrivial) solutions in real numbers. Now R is a field, but Z/ n Z is only a ring (finite, which is nice) and not even an integral domain when n 2. Therefore it is desirable to work instead in a structure that is an integral domain or a field and at the same time catures statements about all owers of simultaneously. This can be done by assing to the limit in a suitable way and leads to the ring Z of -adic integers and the field Q of -adic numbers. Our statement about nontrivial solutions mod n for all n can then simly be exressed by saying that there is a (nontrivial) solution in Z (or in Q ). Consider, for examle, the equation x 2 + 7 = 0 modulo owers of 2. Solutions are given in the following table. mod 2 1 : x 1 mod 2 2 : x 1, 3 mod 2 3 : x 1, 3, 5, 7 mod 2 4 : x 3, 5, 11, 13 mod 2 5 : x 5, 11, 21, 27 It is not hard to see that for n 3, there are always 4 solutions mod 2 n. If Z/2 n Z were a field, this would not be ossible: in a field, a quadratic equation has at most two solutions. However, two of the four are sort of surious: they do not lift to solutions mod 2 n+1. Now if we ass to the limit and only consider solutions that can be lifted indefinitely, then we find two solutions, as exected.
36 19.2. Definition. The ring Z of -adic integers is Z = {(a n ) : a n a n+1 mod n for all n 1} There is a canonical inclusion Z Z, given by a (ā, ā, ā,... ). Now we need some structural information on the ring Z. Z/ n Z. 19.3. Theorem. Z is an integral domain. It only has one maximal ideal, Z, and all non-zero ideals have the form n Z for some n 0. (In articular, Z is a PID and therefore a UFD.) Its unit (or multilicative) grou is Z = Z \ Z. Proof. (a) Z is a maximal ideal. We show that Z /Z = Z/Z; since the latter is a field, the claim follows. Consider the ma n=1 Z /Z (a 1, a 2,... ) + Z a 1 Z/Z. It is a well-defined ring homomorhism and obviously surjective. The homomorhism Z Z induces a homomorhism Z/Z Z /Z, which is inverse to the ma above, hence we have an isomorhism. (b) We have of course that Z Z \ Z (an element in a maximal ideal cannot be a unit). Let us show that we actually have equality. So take u Z \ Z. If u = (u 1, u 2,... ), each u n is invertible in Z/ n Z, so there are unique v n such that u n v n = 1; then v = (v 1, v 2,... ) Z and uv = 1. (c) We now see easily that Z is the only maximal ideal. For assume that m is another maximal ideal. Then m \ Z, and by (b), this means that m contains a unit, hence m = Z, a contradiction. (d) We have n 1 n Z = {0}. For a = (a 1, a 2,... ) n Z imlies a j = 0 for j n. (e) If a Z \ {0}, then there is some n 0 and some u Z such that a = n u: By (d), there is some n such that a n Z \ n+1 Z. Then a = n u where u Z \ Z = Z. (f) Let I Z be a non-zero ideal. Then, by (d) again, there is some n such that I n Z, but I n+1 Z. So there is some a I such that a = n u with u Z. Since u is invertible, n = au 1 I as well, and we find n Z I, hence I = n Z. (g) Z is an integral domain. Suose ab = 0 with a = (a 1, a 2,... ), b = (b 1, b 2,... ). Assume a 0; then a = N u with some N 0, u Z. Then ab = 0 imlies N b = 0. Now this says that N b n+n = 0 in Z/ N+n Z, so b n b N+n 0 mod n, hence b n = 0, for all n. Part (e) in the roof motivates the following definition. 19.4. Definition. For a = (a 1, a 2,... ) Z define the -adic valuation v (a) = max({0} {n 1 : a n = 0}) {0, 1,..., }. Then a = v(a) u with u Z, if a 0 (and v (0) = ) and the valuation is comatible with the -adic valuation on Z. Define the -adic absolute value by 0 = 0, a = v(a) if a 0.
19.5. Definition. The field Q of -adic numbers is the field of fractions of Z. We have that Q = Z [1/], and we can extend the -adic valuation and absolute value to Q : v (a/b) = v (a) v (b) and a/b = a / b ; then for all a Q, with some u Z. 19.6. Lemma. a = v(a) u (1) ab = a b. (2) a + b max{ a, b } a + b. 37 Proof. Easy. In articular, defines a metric on Z and Q : d(a, b) = a b. It is a fact that with this metric, Z is a comact metric sace, and Z is dense in Z. Also, Q can be identified with the comletion of Q with resect to the -adic absolute value (in the same way as R is the comletion of Q with resect to the usual absolute value = ). 19.7. Remark. Define x = x for x R. Then for all a Q, a v = 1. v=, This is easy to see. Desite its aarent triviality, this Product Formula (and its generalization to algebraic number fields) lays an imortant role in number theory. 19.8. Lemma. (1) Every series n=0 c n n with c n Z converges in Z. (2) Every a Z can be written uniquely in the form a = c n n with c n {0, 1,..., 1}. n=0 Proof. Exercise. As an examle, we have in Z 3 2 = 1 + 2 3 + 2 3 2 + 2 3 3 +.... 19.9. Proosition. Let F Z[X 1,..., X k ]. (1) n 1 (x 1,..., x k ) Z k : n F (x 1,..., x k ) (x 1,..., x k ) Z k : F (x 1,..., x k ) = 0. (2) If F is homogeneous, we have n 1 (x 1,..., x k ) Z k \ (Z) k : n F (x 1,..., x k ) (x 1,..., x k ) Z k \ (Z ) k : F (x 1,..., x k ) = 0 (x 1,..., x k ) Q k \ {0} : F (x 1,..., x k ) = 0.
38 Proof. To rove the nontrivial direction ( ), consider the rooted tree with nodes (n, ( x 1,..., x k )) (at distance n from the root (0, (0,..., 0))) for solutions modulo n, where nodes at levels n and n + 1 are connected if the solution at the uer level reduces to the solution at the lower level mod n (comare the motivating examle at the beginning of the section, where = 2, k = 1, and F = X1 2 + 7). Then use König s Lemma (see below) that says that an infinite, finitely branched rooted tree has an infinite ath starting at the root. This ath corresonds to a k-tule of -adic integers. In order to comlete this roof, we need to rove König s Lemma. 19.10. Theorem (König s Lemma). Let T be an infinite, but finitely branched, rooted tree. Then T has an infinite branch (starting at the root). Proof. We construct an infinite branch inductively. Let T 1,..., T m be the finitely many subtrees connected to the root of T. Since T is infinite, (at least) one of the T j must be infinite. Now the first ste of the branch we construct leads to the root of T j, and we continue from there. Since T j is again infinite, this construction will never come to an end, thus leading to an infinite branch in T. Note that the roof needs the Axiom of Choice, unless there is some additional structure that we can use in order to ick one of the infinite subtrees. In our alication, we can reresent the nodes by tules of integers between 0 and n and then ick the smallest one with resect to lexicograhical ordering. So we can do without the Axiom of Choice here. 19.11. Corollary. Z is comact (and hence comlete) in the toology induced by the metric d(x, y) = x y. Proof. Since Z is a metric sace, we can start with an oen covering consisting of oen balls B x (r) = {y Z : x y < r}. Let T 0 be the rooted tree whose nodes at level n corresond to the elements of Z/ n Z, with node a at level n+1 connected to node b at level n if and only if a reduces to b mod n. Note that an oen ball B x ( n ) corresonds to the subtree of T 0 whose root is the node x mod n+1. Let T be the tree obtained from T 0 by removing the subtrees (excet their roots) corresonding to the balls in the given oen covering. An infinite branch in T would corresond to an element of Z that is not in any of the oen balls; since the balls form a covering, such an infinite branch does not exist. By König s Lemma, T must then be finite, and the finitely many leaves of T corresond to a finite subcovering of the given covering. The following result is one of the most imortant ones in the theory of -adic numbers. 19.12. Lemma (Hensel s Lemma). If f Z[x] (or Z [x]) and f has a simle zero a mod, then f has a unique (simle) zero α Z such that α a mod. Proof. Strangely enough, the idea of this roof comes from Newton s method for aroximating roots of olynomials. In the resent context, closeness is measured by the -adic absolute value.
First note that if α + Z = a F (for α Z ), then f (α) reduces to f (a) 0 in F ; therefore f (α) is invertible in Z, and v (f (α)) = 0. Now let α 0 Z be any element such that its image in F is a, and define recursively α n+1 = α n f (α n ) 1 f(α n ). I claim that (α n ) converges in Z. Note that f(y) f(x) = (y x)f (x) + (y x) 2 g(x, y) with a olynomial g Z [x, y], and so f(α n+1 ) = f(α n ) + (α n+1 α n )f (α n ) + (α n+1 α n ) 2 g(α n, α n+1 ) = f (α n ) 2 f(α n ) 2 g(α n, α n+1 ) This shows that v (f(α n+1 )) 2v (f(α n )), and since v (f(α 0 )) 1, we have v (f(α n )) 2 n. This imlies that v (α n+1 α n ) 2 n, and so (by the ultrametric triangle inequality), the sequence (α n ) is a Cauchy sequence. Since Z is comlete, (α n ) converges; let α be the limit. Note that ᾱ = a, since v (α α 0 ) 1. Also, olynomials are continuous (in the -adic toology), so, assing to the limit in the recursion above, we obtain α = α f (α) 1 f(α) = f(α) = 0. 39 To show uniqueness, assume that α and α are two distinct zeros of f both reducing to a mod. Then 1 n = v (α α) <. But we have and so 0 = f(α ) f(α) = (α α)f (α) + (α α) 2 g(α, α) f (α) = (α α) g(α, α). But v (f (α)) = 0, whereas the valuation of the right hand side is at least n > 0, a contradiction. Here is an easy consequence. 19.13. Lemma. Let be an odd rime and a Z such that a. Then a is a square in Z if and only if a is a quadratic residue mod. If a Z 2 is odd, then a is a square in Z 2 if and only if a 1 mod 8. Proof. Necessity is clear in both cases. For odd, we consider f(x) = x 2 a. If a is a quadratic residue mod, then there is some s F such that f(s) = 0; also f (s) = 2s 0. By Hensel s Lemma 19.12, sufficiency follows. For = 2, we consider f(x) = 2x 2 + x A, where a = 8A + 1. Obviously, 2 f(a) and 2 f (A) = 4A+1, hence again by Hensel s Lemma 19.12, f has a root α Z. But then we also have (4α + 1) 2 a = 8f(β) = 0. For examle, this shows that 7 is a square in Z 2.
40 19.14. Lemma. Let a Q and write a = n u with u Z (and n = v (a)). Then a is a square in Q if and only if n is even and u is a square in Z. Proof. Sufficiency is clear. If a = b 2, then n = v (a) = 2v (b) must be even, and u = (b/ n/2 ) 2 Z is a square in Q. But we have v (b/ n/2 ) = 0, so u is the square of an element in Z. We can deduce that for a ternary quadratic form ax 2 + by 2 + cz 2 with abc squarefree, the necessary conditions in 18.2 imly (and therefore are equivalent to the statement) that there are nontrivial solutions in R and in Q for all rimes. This is clear for R. For an odd rime, the conditions give us a solution mod such that gcd(x, y, z), which then lifts to a solution in Z. For = 2, the conditions allow us to find a solution mod 8, which then lifts to Z 2. 19.15. Theorem. Let Q(x, y, z) be a non-degenerate ternary quadratic form. Then Q(x, y, z) = 0 has a rimitive integral solution if and only if it has nontrivial solutions in real numbers and in -adic numbers for all rimes. Proof. There is a diagonal ternary quadratic form Q = ax 2 + by 2 + cz 2 with abc squarefree that is equivalent to Q. It is clear that Q has nontrivial solutions in Q, R or Q if and only if Q does. So Q satisfies the conditions in 18.2. By Legendre s Theorem 18.3, Q has a rimitive integral solution, hence a nontrivial soution in Q. Therefore, Q also has a nontrivial solution in Q, which can be scaled to give a rimitive integral solution. This result is called the Hasse or Local-Global Princile for ternary quadratic forms. It states that the existence of local solutions (in R, Q ) imlies the existence of global solutions (in Q). In fact, this is valid for quadratic forms in general, but the roof is nontrivial for four or more variables. Note that this imlies that a quadratic form in five or more variables has a rimitive integral solution if and only if it is indefinite (i.e., has nontrivial real solutions): there are always -adic solutions for all in this case (Exercise!). Note also that the Hasse Princile does not hold in general. A famous counterexamle (due to Selmer) is given by 3x 3 + 4y 3 + 5z 3 = 0, which has nontrivial solutions in R and all Q (Exercise), but not in Q (hard). 20. The Hilbert Norm Residue Symbol There is a connection between the various necessary local conditions for the existence of a solution for a non-degenerate ternary quadratic form. Since every non-degenerate ternary quadratic form is equivalent to one of the form a x 2 + b y 2 z 2 (first diagonalize, then scale the form and/or the variables aroriately), it suffices to consider such forms.
20.1. Definition (Hilbert Norm Residue Symbol). Let a, b Z \ {0}. For a rime, set ( a, b ) = 1 if a x 2 + b y 2 = z 2 has a non-trivial solution in Q. Otherwise, we set Similarly, ( a, b ) = 1. ( a, b ) = 1 if there is a non-trivial real solution, and 1 otherwise. Theorem 19.15 then says that a x 2 + b y 2 = z 2 has a non-trivial solution in Q if and only if ( a, b ) = 1 for all laces v =,. v 41 20.2. Theorem. Let a, b Z \ {0}. Then for almost all rimes, and we have the Product Formula ( a, b ) = 1, v=, (Here v runs through all rimes and.) ( a, b ) = 1. v This means that the number of laces v such that the local condition fails is always even. There are always solutions in Q if is odd and does not divide ab. By definition, this imlies that only finitely many of the symbols can be 1. The roduct formula then is equivalent to the Law of Quadratic Recirocity and its sulements. 20.3. Useful Fact. Here is a simle but useful fact: for all nonzero a, b, c Z and all laces v, we have ( ab, ac ) ( ab, bc ) =. v v This is because the forms ab x 2 + ac y 2 z 2 and ab x 2 bc y 2 z 2 are equivalent (multily the form by ab, then scale x and y to remove the squares in the coefficients, then exchange x and z). In the following, we will assume that a and b are squarefree. This is no loss of generality, since we can achieve this by scaling the variables.
42 20.4. Values of the Hilbert Norm Residue Symbol, I. Let be an odd rime. Assume u and v are not divisible by. Then we have ( u, v ) ( u, v ) ( u ) ( u, v ) ( v ( u, v ) ( uv ) = 1, =, =, =. ) By an easy generalization of Lemma 15.7, we get the first equality. The next two are equivalent, the symbol being obviously symmetric. We note that any nontrivial solution of u x 2 + v y 2 = z 2 mod 2 must have x and z not divisible by. But then a solution mod forces u to be a quadratic residue mod. On the other hand, if u is a quadratic residue mod, then there is a solution (x, y, z) = (1, 0, z) in Z by Lemma 19.13. For the last equality, we make use of 20.3, with a =, b = u, c = v; we are then back in the revious case. 20.5. Values of the Hilbert Norm Residue Symbol, II. For = 2, the situation is the most comlicated. Let u and v be odd. Then ( u, v ) ( = ( 1) u 1 v 1 u, 2v ) 2 2, = ( 1) u2 1 8 ( 1) u 1 v 1 2 2, 2 2 ( 2u, v ) ( = ( 1) v2 1 8 ( 1) u 1 v 1 2u, 2v ) 2 2, = ( 1) u2 v 2 1 8 ( 1) u 1 v 1 2 2. 2 2 This can be verified with the hel of Lemma 19.13, in a similar sirit as before. 20.6. Values of the Hilbert Norm Residue Symbol, III. Finally, we have to deal with =. Here the situation is simle: ( a, b ) = 1 a < 0 and b < 0. 20.7. Proof of the Product Formula. We want to rove that ( a, b ) = 1. v v=, We first observe that the symbols are bimultilicative: ( aa, b ) ( a, b )( a, b ) ( a, bb ) ( a, b )( a, b ) = and =. v v v v v v This follows from the values given above (but there is really a deeer reason for that). It therefore suffices to rove the roduct formula in each of the following cases (where and q are distinct odd rimes): (a, b) = ( 1, 1), ( 1, 2), ( 1, ), (2, 2), (2, ), (, ), (, q). Of these, the cases (2, 2) and (, ) can be reduced to ( 1, 2) and ( 1, ), resectively, using 20.3 again. The remaining cases are shown in the following table. (All symbols not shown here are +1.) ( a, b ) ( a, b ) ( a, b ) ( a, b ) (a, b) 2 q ( 1, 1) 1 1 +1 +1 ( 1, 2) +1 +1 +1 +1 ( ( 1, ) +1 ( 1) 1 2 1 ) +1 (2, ) +1 ( 1) 2 1 ( 8 2 ) +1 (, q) +1 ( 1) 1 q 1 ) ( 2 2 ) q ( q
From this, we see that the roduct formula for the Hilbert Norm Residue Symbol is equivalent to the Law of Quadratic Recirocity, together with its two sulements. In some sense, the Product Formula is a nicer way of stating these facts, because it is just one simle statement instead of three different ones. 21. Pell s Equation and Continued Fractions 21.1. The Equation. Let d > 0 be a nonsquare integer. The equation x 2 d y 2 = 1 or x 2 d y 2 = ±1, to be solved in integers x and y, is known as Pell s Equation. The name goes back to Euler, who for some reason mistakenly assumed that Pell had contributed to the theory of its solution. In fact, it was Euler who did most of that (but already Fermat had studied it), so Euler s Equation might be more aroriate... Exercise: Find all solutions when d 0 or d is a square! Let us look at some examles. For d = 2, we have the following solutions (of x 2 dy 2 = 1) with x, y 0. x 1 3 17 99 577 3363 19601 y 0 2 12 70 408 2378 13860 For d = 3, we have x 1 2 7 26 97 362 1351 y 0 1 4 15 56 209 780 And for d = 409, the first two solutions are x 1 25052977273092427986049 y 0 1238789998647218582160 We observe that there are nontrivial solutions, and also that they grow quite fast (the number of digits seems to grow linearly, so the numbers grow exonentially). The sequence of solutions seems to go on, so it aears that there are infinitely many solutions. However, the first nontrivial solution may be rather large. 21.2. Some structure. When we studied sums of two squares, we made use of the fact that x 2 + y 2 = (x + iy)(x iy) = x + iy 2. Likewise, we can study the ring R d = Z[ d] = {a + b d : a, b Z} R and observe that x 2 d y 2 = (x + y d)(x y d). More generally, if R is a ring that as an additive grou is a finitely generated free Z-module (and similary for a ring that is a finite-dimensional F -algebra for some field F ) and α R, then left multilication by α, m α : R x αx R defines an endomorhism of R as a Z-module; its determinant is called the norm of α, N(α) = det(m α ). Since m αβ = m α m β, it follows that N(αβ) = det(m αβ ) = det(m α m β ) = det(m α ) det(m β ) = N(α)N(β), i.e., the norm is multilicative. 43
44 For R d, we obtain, reresenting m α by the matrix M α with resect to the Z-basis 1, d: N(x + y d) = x y dy x = x2 d y 2. Since (by the formula for the inverse of a 2 2 matrix) M 1 x+y d = N(x + y d) 1 M x y d, we see that N(x + y d) = ±1 if and only if x + y d R d is a unit. This means that what we are interested in is essentially the unit grou of the ring R d. In articular, we see that the solution sets S d = {(x, y) Z 2 : x 2 dy 2 = 1} and T d = {(x, y) Z 2 : x 2 dy 2 = ±1} have a natural structure as abelian grous under the multilication (x, y) (x, y ) = (xx + dyy, xy + yx ). Also, S d is a subgrou of T d of index at most 2. Let Then 1/φ(x, y) = x y d and therefore φ : S d R, (x, y) x + y d. φ(x, y) + 1/φ(x, y) φ(x, y) 1/φ(x, y) x =, y = 2 2. d This shows that φ is injective and that φ(x, y) > 0 if and only if x > 0, and φ(x, y) > 1 if and only if x, y > 0. Since ( 1, 0) S d, the homomorhism S d {±1}, (x, y) sign(x + y d) is onto, and the subgrou is of index 2 in S d. S + d = {(x, y) S d : x > 0} 21.3. Lemma. Assume that S + d is nontrivial and let (x 1, y 1 ) be the solution with minimal x 1 such that x 1, y 1 > 0. Then (x 1, y 1 ) generates S + d. Proof. Note that α = φ(x 1, y 1 ) > 1 and that there is no (x, y) S + d such that 1 < φ(x, y) < α (since 0 < x < x, 0 y, y imlies φ(x, y) < φ(x, y )). Now let (x, y) S + d be arbitrary and set β = φ(x, y). Since α > 1, there is n Z such that α n β < α n+1. This imlies that 1 α n β = φ((x, y) (x 1, y 1 ) n ) < α and so (x, y) (x 1, y 1 ) n = (1, 0), hence (x, y) = (x 1, y 1 ) n. The idea of this roof is that a discrete subgrou of (R, +) (take the logarithm to get into the additive grou of R) is either trivial or cyclic. We see that if there are nontrivial solutions, then S d = Z/2Z Z as an abstract abelian grou. It remains to show that there are nontrivial solutions. 21.4. Diohantine aroximation. Note that a solution to x 2 d y 2 = 1 will rovide a very good rational aroximation of d: d x = x2 dy 2 y y 2 d + x/y < 1 2 dy 2 So the question is whether such good aroximations always exist. That we can get at least close is shown by the following result.
Lemma. Let α R \ Q be an irrational real number. Then there are infinitely many rational numbers /q such that α < 1 q q. 2 45 Proof. We make use of the box rincile. Let x = x x denote the fractional art of x R. Then of the n + 1 numbers 0, α, 2α,..., nα in the half-oen interval [0, 1), there must be two that fall into the same subinterval [k/n, (k + 1)/n) for some 0 k < n. So there are 0 l < m n such that 1 n > mα lα = (m l)α ( mα lα ). Setting = mα lα and q = m l, we find 0 < α < 1 q nq 1 q. 2 (Note that α is irrational, so not equal to /q.) By taking n larger and larger, we find a sequence of fractions /q such that α /q < 1/q 2 and q (since α /q 0). 21.5. Remarks. (1) The roerty that the set { α q Q : < 1 } q q 2 is infinite in fact characterizes the irrational numbers among the real numbers α. In fact, if α = r/s, then α /q is either zero or else at least 1/qs, so 0 < α /q < 1/q 2 imlies q < s. (2) If α R is algebraic (i.e., α is a root of a monic olynomial with rational coefficients), then α cannot aroximated much better by rational numbers than in the result above. More recisely, for every ε > 0, there are only finitely many fractions /q such that α < 1 q q. 2+ε This is known as Roth s Theorem (or also the Thue-Siegel-Roth Theorem, since Thue and Siegel obtained similar, but weaker results earlier) and a quite dee result. It imlies for examle that an equation F (x, y) = m where F Z[x, y] is homogeneous of degree 3 and 0 m Z can have only finitely many integral solutions. (Such equations are called Thue Equations; Thue used his result mentioned above to rove this statement.)
46 21.6. Alication to Pell s Equation. In order to show that nontrivial solutions to Pell s Equation exist, we use the result we just rove as a starting oint and aly the box rincile two more times. First let us show the following. There are infinitely many airs (x, y) Z 2 such that x 2 dy 2 < 2 d + 1. Indeed, by Lemma 21.4 there are infinitely many (x, y) such that x/y d < 1/y 2 (note that d is irrational, since d is not a square). Then x 2 d y 2 = y 2 x y x d ( y + ) d < 2 d + x y d < 2 d + 1 y 2 d + 1. 2 Now there are only finitely many integers m such that m < 2 d + 1. Therefore there must be one m such that there are infinitely many airs (x, y) such that x 2 dy 2 = m (note the use of the box rincile). Now the idea is to divide two suitably chosen solutions to x 2 dy 2 = m to obtain a solution of x 2 dy 2 = 1. To this end, we use the box rincile another time to conclude that there must be two airs (x, y) and (u, v) with 0 < x < u, 0 < y, 0 < v, x 2 dy 2 = u 2 dv 2 = m and x u, y v mod m. Then and (xu d yv) 2 d(uy xv) 2 = m 2 xu d yv x 2 d y 2 = m 0 mod m, uy xv xy xy = 0 mod m, so ( xu d yv uy xv ), S + d m m is a nontrivial solution (otherwise u/v = x/y which imlies (u, v) = (x, y), since y, v > 0 and x y, u v). We have roved: 21.7. Theorem. Pell s Equation has nontrivial solutions. The solution set S d has a natural structure as an abelian grou, and as such it is isomorhic to Z/2Z Z. The generator (x 1, y 1 ) of S + d with x 1, y 1 > 0 is called the fundamental solution of the equation. All other solutions then are of the form Note that for n large, ±(x n, y n ) with x n + y n d = (x1 + y 1 d) n, n Z. x n (x 1 + y 1 d) n, y n (x 1 + y 1 d) n 2 2, d which exlains the observed linear growth in the number of digits in the solutions. 21.8. Continued fractions. The question remains how to actually find the fundamental solution to a given Pell Equation, or more generally, how to find the good rational aroximations to irrational numbers that we are romised in Lemma 21.4. The answer is rovided by continued fractions. Let α R \ Q be an irrational real number again. We set α 0 = α and then define recursively for n 0 1 a n = α n, α n+1 =. α n a n Note that all α n are irrational, so we can never have α n = a n. Note also that α n > 1 and therefore a n 1 as soon as n 1.
47 We then have 1 α = a 0 + 1 a 1 + a 2 + + a n 1 + 1 α n and we denote the nested fraction on the right hand side by 1 1 [a 0 ; a 1, a 2,..., a n 1, α n ] (where a 0, a 1,..., a n 1, α n can be arbitrary numbers). Note that we have the recurrence [a 0 ] = a 0, [a 0 ; a 1,..., a n 2, a n 1, x] = [a 0 ; a 1,..., a n 2, a n 1 + 1/x] (n 1). We call the formal exression the continued fraction exansion of α. [a 0 ; a 1, a 2, a 3,... ] Given integers a 0, a 1, a 2,... (with a 1, a 2, 1), it is clear that [a 0 ; a 1,..., a n ] is a rational number. Let us find a way to comute its numerator and denominator efficiently. 21.9. Lemma. Set 2 = 0, q 2 = 1, 1 = 1, q 1 = 0 and define recursively Then we have the following. n+1 = a n+1 n + n 1, q n+1 = a n+1 q n + q n 1. (1) n+1 q n n q n+1 = ( 1) n for all n 2. In articular, n q n. (2) [a 0 ; a 1,..., a n ] = n /q n for all n 0. (3) If [a 0 ; a 1, a 2,... ] is the continued fraction exansion of α, then for n 0, α n 1 < 1. q n q n q n+1 qn 2 Also, sign(α n /q n ) = ( 1) n. (4) Under the assumtions of (3), we have 0 q 0 < 2 q 2 < 4 q 4 < < α < < 5 q 5 < 3 q 3 < 1 q 1. The fractions n /q n are called the convergents of the continued fraction exansion [a 0 ; a 1, a 2,... ]. Proof. (1) This is an easy induction. (2) We claim that more generally, for n 1 we have [a 0 ; a 1,..., a n, x] = nx + n 1 q n x + q n 1.
48 This is clear for n = 1: x = ( 1 x + 1 )/(q 1 x + q 2 ). Assuming it is OK for n, we find [a 0 ; a 1,..., a n, a n+1, x] = [a 0 ; a 1,..., a n, a n+1 + 1/x] = ) n( an+1 + 1 x + n 1 ( ) = (a n+1 n + n 1 )x + n q n an+1 + 1 x + qn 1 (a n+1 q n + q n 1 )x + q n = n+1x + n q n+1 x + q n Secializing x to a n+1, the statement of the lemma follows. (3) We have α = [a 0 ; a 1,..., a n, α n+1 ]. Using the claim established above, we obtain α n q n = α n+1 n + n 1 α n+1 q n + q n 1 n q n = ( 1) n q n (α n+1 q n + q n 1 ). This roves the assertion about the sign. Also, α n+1 > a n+1, hence α n+1 q n + q n 1 > a n+1 q n + q n 1 = q n+1, so α n 1 <. q n q n q n+1 (4) Note that n+2 q n+2 n q n = a n+2( n+1 q n n q n+1 ) q n q n+2 is ositive for n even and negative for n odd. = a n+2( 1) n q n q n+2 21.10. Corollary. Let a 0, a 1, a 2, Z with a 1, a 2, 1. Then the sequence ( n /q n ) n, where n q n = [a 0 ; a 1, a 2,..., a n ], converges to a limit α, and [a 0 ; a 1, a 2,... ] is the continued fraction exansion of α. Proof. By the above, we have n+1 /q n+1 n /q n = 1/(q n q n+1 ). Also, q n n 1 for n 2, hence n 0 1/(q nq n+1 ) converges. This imlies that the sequence is Cauchy and therefore has a limit α. Since a 0 n /q n < a 0 + 1 for n 3, we must have a 0 = α. Then α 1 = 1 α a 0 = lim n [a 1 ; a 2,..., a n ]. Continuing, we find successively that the a n are the numbers making u the continued fraction exansion of α. In order to conclude that we can use the continued fraction exansion of d in order to find the fundamental solution to a Pell Equation, we need to know that any sufficiently good aroximation aears as a convergent.
49 21.11. Lemma. Let α R \ Q, and let /q Q such that α < 1 q 2q. 2 Then /q is a convergent of the continued fraction exansion of α. Proof. We first consider the case q = 1. Then /q = is the integer closest to α, so = a 0 = 0 /q 0 (and we are done) or = a 0 +1. In the latter case, a 0 +1/2 < α, and we find α 1 < 2, so a 1 = 1 and 1 /q 1 = a 0 + 1 =, so we are done again. For q 2, we claim a slightly stronger result, namely that α /q < 1/(q(2q 1)) imlies that /q = n /q n for some n. We can assume that /q is in lowest terms. We will rove this by induction on q. First observe that we can assume that 0 < α < 1 (since shifting everything by a 0 does not affect the assumtion nor the conclusion). Then we must have 0 < /q < 1 as well: first we cannot have equality on either end since q > 1. Second, assume that /q < 0. Then 1 α q(2q 1) > > 1 q q q, which contradicts q 2. Similarly if /q > 1. So we have 0 < < q. We now observe that 1 α q = α q q α < 1 α(2q 1). Also, ( α(2q 1) q 1 ) (2q 1) = 2 q(2q 1) q 1 q 2 1, so that 1 α q 1 < (2 1). If 2, we are therefore done by induction. If = 1, then we have 2q 2 q(2q 1) = 1 q 1 q(2q 1) < α < 1 q + 1 q(2q 1) = 2 2q 1 and so q 1 < 1 2 α < q + q 2(q 1) < q + 1. If a 1 = q, then 1 /q 1 = 1/q and we are done. Otherwise, a 1 = q 1 and a 2 = 1 and then 2 /q 2 = 1/q and we are done again. 21.12. Theorem. Let d 1 be a nonsquare integer. Then the fundamental solution to the Pell Equation x 2 d y 2 = 1 is given by (x 1, y 1 ) = ( n, q n ), where n 0 is minimal such that 2 n d q 2 n = 1. Here, n /q n are the convergents of the continued fraction exansion of d. Proof. We have seen that any nontrivial ositive solution (x, y) satisfies d x < 1 y 2 dy 1 2 2y. 2 By the revious lemma, every such solution must be of the form (x, y) = ( n, q n ), where n /q n is a convergent of the continued fraction exansion of d. Since all a n are ositive (including a 0 ), the sequence of numerators 0, 1,... is strictly
50 increasing. Therefore the fundamental solution (which is the smallest ositive solution) must be the first one we encounter. 21.13. Examle. Let us illustrate the result by an examle. Take d = 31, then 5 < 31 < 6. We obtain the following table. n α n a n n q n 2 n 31 qn 2 0 31 5 5 1 6 1 1 31 5 = 31+5 1 6 1 5 6 6 2 31 1 = 31+1 1 11 2 3 5 5 3 31 4 = 31+4 3 39 7 2 3 3 4 31 5 = 31+5 5 206 37 3 2 2 5 31 5 = 31+5 3 657 118 5 3 3 6 31 4 = 31+4 1 863 155 6 5 5 7 31 1 = 31+1 1 1520 273 1 6 6 8 31 5 = 31 + 5 We find the fundamental solution (1520, 273). Note that if we had first found a solution to x 2 d y 2 = 1, we can simly square it to find the fundamental solution to x 2 d y 2 = +1. Note also that it can be shown (Exercise!) that 2 n d q 2 n = ±1 if and only if α n+1 = d + a with an integer a (which must be d unless n = 1). 22. Ellitic Curves After we have looked at equations of degree 2 in some detail, let us now move u one ste and consider equations of degree 3. We will restrict our attention to the case of two variables. It turns out that a very interesting class of such equations can be brought into the following form. y 2 = x 3 + A x + B, where A and B are rational numbers (which we can even assume to be integers, by scaling the variables x and y aroriately). We will see that we should require that 4A 3 + 27B 2 0. In this case, the equation above is said to define an ellitic curve E. As this is a Number Theory course, we will be interested in the rational solutions of the equation (also called rational oints on E). However, it makes sense to study solutions (or oints) with coordinates in any field. For examle, we can easily visualize the set of real oints as a curve in the xy-lane.
51 y y 1 0 1 x 1 0 1 x y 2 = x 3 x + 1 y 2 = x 3 x It turns out that it is advantageous to comlete (or close u, or comactify ) the curve by not considering it in the usual affine lane, but by considering it in the rojective lane, which is obtained by adding a oint at infinity to the affine lane for each direction (reresented by a family of arallel lines). For a formal definition, see below. What it comes down to is the following. We relace x and y in the equation by x/z and y/z, resectively, and then multily by an aroriate ower of z to obtain a homogeneous olynomial. For our ellitic curve, we obtain ( y ) 2 ( x ) 3 y 2 = x 3 x + A x + B = + A z z z + B y2 z = x 3 + A xz 2 + B z 3. The oints are now given by triles (ξ, η, ζ), but we have to identify triles that are related by scaling with a constant factor (and (0, 0, 0) is not allowed). We write (ξ : η : ζ) for the oint given by (ξ, η, ζ); then we have (ξ : η : ζ) = (λξ : λη : λζ) for all (ξ, η, ζ) (0, 0, 0) and λ 0. We find the oints in the affine xy-lane as (ξ : η : 1); all oints with ζ 0 come u in that way. The remaining oints (with ζ = 0) are the oints at infinity ; the oint (ξ : η : 0) corresonds to the direction of lines arallel to ηx = ξy. The set of all these oints with ξ, η, ζ rational is denoted P 2 (Q), where P 2 stands for the rojective lane. We then set E(Q) = {(ξ : η : ζ) P 2 (Q) : η 2 ζ = ξ 3 + Aξζ 2 + Bζ 3 } ; this is the set of rational oints on the ellitic curve E. Note that this set is always non-emty: it contains the oint O = (0 : 1 : 0), which is the only oint at infinity on E. (To see this, ut z = 0 in the homogeneous equation for E; we find x 3 = 0, so the oints at infinity are all of the form (0 : η : 0). Because of the identification modulo scaling, this is always the same oint (0 : 1 : 0).)
52 22.1. Grou Structure. We now want to define a grou structure on the oints on E, and in articular on E(Q). For this, we note that a line intersects the curve E in exactly three oints, counting multilicity. There are some remarks to make here. First of all, counting multilicity means that a oint on E that has the line as its tangent has to be counted twice (or even three times, when the oint is a oint of inflection). Second, some of the oints may have coordinates in a larger field (for examle, a line may intersect E(Q) only in one oint, even counting multilicity; the other two oints will then have comlex coordinates). Third, some (or all) of the intersection oints may be at infinity. For examle, the line at infinity z = 0 intersects E just in the oint O = (0 : 1 : 0), which is a oint of inflection with tangent z = 0, and so must be counted three times. (To see that O is an inflection oint, consider the affine lane on which y 0. In terms of the curve equation, this comes down to setting y equal to 1: z = x 3 + A xz 2 + B z 3 ; O then has coordinates (x, z) = (0, 0), and from the equation, we see that the curve looks very much like z = x 3 near the origin.) But note that when two of the intersection oints are rational (or real), then so must be the third: its x-coordinate (say) is a root of a cubic olynomial with rational (real) coefficients, and the other two roots are rational (real) numbers. We now define addition on E by saying that O is the origin (zero element) of the grou and that three oints P 1, P 2, P 3 add u to zero if and only if they are the three oints of intersection of E with a line. Assuming this really defines a grou law, how do we find the negative P of a oint P, and how do we find the sum P + Q of two oints? For the negative, note that O + P + ( P ) = O, so O, P, P must be the three oints of intersection of E with a line. This line is determined by the two oints O and P (in the affine icture, it is the vertical line through P ), and P is the third oint of intersection. In terms of coordinates, if P = (ξ, η), then P = (ξ, η). In articular, P = P, and so 2P = O, if and only if η = 0 (or P = O). To find the sum, note that P + Q + ( (P + Q)) = O, so (P + Q) must be the third oint of intersection of E with the line through P and Q. (This line is the tangent line to E at P when P = Q.) We then find P + Q as the negative of this oint. It is retty obvious that the oeration defined in this way satisfies all axioms of an abelian grou excet associativity. The roof of associativity is a bit involved, so we ski it here. 22.2. Examle. Let us consider a secific curve, E : y 2 = x 3 x + 1. (See the left hand icture on age 51.) There are some obvious oints: ±P = (1, ±1), ±Q = (0, ±1), ±R = ( 1, ±1) Let us do some comutations with them. It is clear that P, Q, R are the oints of intersection of E with the line y = 1, so we have P + Q + R = O.
Let us find P R = P + ( R). For this, we first have to find the line through P = (1, 1) and R = ( 1, 1); this line is y = x. We use the equation of the line to eliminate y form the equation of E: x 2 = x 3 x + 1 x 3 x 2 x + 1 = (x 1) 2 (x + 1) = 0. We see that the three oints of intersection have x-coordinates 1 (counted twice) and 1, so (using y = x) the oints are P (twice) and R. This means that the third oint of intersection is P, and P R = P and therefore R = 2P. Since P + Q + R = 0, we also find that Q = 3P. So all the oints we listed at the beginning are multiles of P. Let us find some more multiles. We can comute 4P by doubling 2P. So we have to find the tangent line to E at 2P = R = ( 1, 1). What is its sloe? We can find it by imlicit differentiation: y 2 = x 3 x + 1 = 2y dy = (3x 2 1) dx = dy dx = 3x2 1. 2y So the sloe of the tangent at 2P is (3 1)/2 = 1, and the line has equation y = x + 2. We find (x + 2) 2 = x 3 x + 1 x 3 x 2 5x 3 = (x + 1) 2 (x 3) = 0, so the third oint of intersection is 4P = (3, 5), and 4P = (3, 5). Note that when the line has sloe λ, so its equation is y = λx + µ for some µ, the cubic olynomial in x we get is of the form x 3 λ 2 x 2 + lower order terms. So λ 2 is the sum of the three roots, and we can find the x-coordinate ξ 3 of P + Q, where P = (ξ 1, η 1 ), Q = (ξ 2, η 2 ), as ξ 3 = λ 2 ξ 1 ξ 2 and then the y-coordinate is η 3 = (λξ 3 + µ). Let us comute 6P in order to see that we can also get non-integral oints. The sloe of the tangent at 3P = Q = (0, 1) is 1/ 2 = 1/2, so the line is y = 1x 1. Hence the x-coordinate of 6P is ( 1 2 2 )2 0 0 = 1, and the y-coordinate 4 is ( 1 1 1) = 7. 2 4 8 For this curve E, it can be shown that the grou E(Q) is infinite cyclic and generated by P. Let us now ut these things on a more formal basis. In the following, K will be an arbitrary field (sometimes assumed not to have characteristic 2 or 3). 22.3. Definition. (1) The affine lane over K is the set A 2 (K) = K 2 = {(ξ, η) : ξ, η K}. (2) The rojective lane over K is the set of equivalence classes P 2 (K) = (K 3 \ {(0, 0, 0)})/, where the equivalence relation is given by (ξ, η, ζ) (λξ, λη, λζ) for λ K. 53
54 We write (ξ : η : ζ) for the equivalence class of (ξ, η, ζ) (and call it a K-rational oint on the rojective lane). Note that we have an injection A 2 (K) P 2 (K), given by (ξ, η) (ξ : η : 1); its image is the set of oints (ξ : η : ζ) with ζ 0. (On that set, the inverse ma is (ξ : η : ζ) (ξ/ζ, η/ζ).) Points with ζ = 0 are called oints at infinity. 22.4. Definition. (1) An affine (lane algebraic) curve over K is given by a nonzero olynomial F K[x, y]. (2) A rojective (lane algebraic) curve C over K is given by a nonzero homogeneous olynomial F K[x, y, z]. If F has degree d, then the curve is said to be of degree d as well. A curve of degree 1 is called a line. We write The set C : F (x, y, z) = 0. C(K) = {(ξ : η : ζ) P 2 (K) : F (ξ, η, ζ) = 0} is called the set of K-rational oints on C. (3) If C : F (x, y) = 0 is an affine curve, and the (total) degree of F is d, then C : F ( x (x, y, z) = 0 where F (x, y, z) = z d F z, y ) z is a rojective curve of degree d, called the rojective closure of C. (Note that F (x, y) = F (x, y, 1).) We will only consider rojective curves in the following, but it is often convenient to give an affine equation. The curve under consideration will then be the rojecetive closure of this affine curve. 22.5. Definition. (1) Let C : F (x, y, z) = 0 be a rojective curve over K, P C(K). The oint P = (ξ : η : ζ) is said to be a singular oint of C if F F F (ξ, η, ζ) = (ξ, η, ζ) = (ξ, η, ζ) = 0. x y z (2) The curve C is said to be singular, if there is a field K K (which can be taken to be an algebraic extension of K) and a singular oint P C(K ). Otherwise, C is said to be regular or smooth. The motivation behind this definition is that if not all the artial derivatives vanish, then there is a well-defined tangent line to the curve at the oint P, given by F F F (ξ, η, ζ) x + (ξ, η, ζ) y + (ξ, η, ζ) z = 0. x y z Such a oint is nice, whereas a oint without a tangent line is bad. Now we can define what an ellitic curve is.
22.6. Definition. An ellitic curve over K is a smooth rojective curve over K given by an equation of the form E : y 2 z + a 1 xyz + a 3 yz 2 = x 3 + a 2 x 2 z + a 4 xz 2 + a 6 z 3 (with a 1, a 2, a 3, a 4, a 6 K). When the characteristic of K is not 2 or 3, then by a suitable linear change of variables, we can ut this into the form E : y 2 z = x 3 + A xz 2 + B z 3. (First comlete the square on the left (need to divide by 2), then the cube on the right (need to divide by 3).) We will usually just write the affine form of these equations: y 2 + a 1 xy + a 3 y = x 3 + a 2 x 2 + a 4 x + a 6, y 2 = x 3 + A x + B. 55 A erhas better definition is a smooth rojective curve of degree 3 with a secified K-rational oint on it. It turns out that there is always a (in general nonlinear) transformation that uts this more general cubic equation into one of the form above, such that the secified oint is the oint at infinity O = (0 : 1 : 0). For the definition to be useful, we need a way to find out whether a given equation defines a smooth curve. 22.7. Proosition. Assume that K is not of characteristic 2 or 3. A curve C : y 2 z = x 3 + A xz 2 + B z 3 is smooth if and only if 4A 3 + 27B 2 0. Proof. First assume that P = (ξ : η : ζ) is a singular oint. Then ζ 0 (since the only oint at infinity on C is (0 : 1 : 0), which is not singular), so we can take ζ = 1. Let F (x, y, z) = y 2 z + x 3 + Axz 2 + Bz 3. Then the conditions are F x (ξ, η, 1) = 3ξ2 + A = 0 F (ξ, η, 1) = 2η = 0 y F z (ξ, η, 1) = η2 + 2Aξ + 3B = 0 Since 2 0 in K, this imlies η = 0 and then 2Aξ+3B = 3ξ 2 +A = 0. Multilying the second equation by 4A 2 and lugging in the first, we find that 4A 3 +27B 2 = 0. Now assume that 4A 3 + 27B 2 = 0. If A 0, one checks that P = ( 3B : 0 : 2A) is a singular oint on C. If A = 0, then B = 0 as well (since 3 0 in K), and P = (0 : 0 : 1) is singular on C : y 2 z = x 3. In order to define the grou structure on an ellitic curve, we need a statement about the intersection of a line and a curve.
56 22.8. Proosition. Let L : ax + by + cz = 0 be a line and C : F (x, y, z) = 0 a rojective curve of degree d such that L C (i.e., F is not divisible by ax + by + cz). Then for every sufficiently large field K K (we can take K to be an algebraic closure of K), the intersection L(K ) C(K ) has exactly d oints, counting multilicity. Proof. Here is a sketch of a roof. Not all of a, b, c are zero, so let us assume without loss of generality that c 0. Then we can solve the equation of L for z: z = a c x b c y and lug this into the equqation of C: F 1 (x, y) = F (x, y, a c x b ) c y = 0, where F 1 is a homogeneous olynomial of degree d in two variables. (Note that F 1 0 since there are oints on L on which F does not vanish.) Therefore F 1 slits into linear factors over some finite field extension K of K: F 1 (x, y) = γ k (α i x β i y) e i i=1 with α i, β i, γ K, α i β j α j β i for i j, e i 1, and k i=1 e i = d. The oints in L C are then P i = (cβ i : cα i : aβ i bα i ) L(K ) C(K ), and the statement is true if we count P i with multilicity e i. 22.9. Remark. If, in the situation above, d 1 of the intersection oints are defined over K (i.e., are in P 2 (K)), then so is the last, dth, one. The reason for this is that (assuming for simlicity that y does not divide F 1 ) F 1 (x, 1) K[x] is a olynomial of degree d such that d 1 of its roots are in K, and therefore the last root must be in K as well (the sum of the roots is minus a quotient of coefficients of F 1 (x, 1) and therefore in K). Using this remark, we can define the grou law (as we did earlier). 22.10. Theorem. Let E be an ellitic curve over K. Then the set E(K) of K- rational oints on E has the structure of an abelian grou with zero element O = (0 : 1 : 0) and addition P + Q = (P Q) O (and negation P = P O) where P Q denotes the third oint of intersection of the line through P and Q (the tangent line to E at P when P = Q) with E. We will not give a full roof here, but we note that all axioms other than associativity are easily verified. To show associativity, it is enough to show that (P + Q) R = P (Q + R) for all triles of oints P, Q, R E(K). If the oints are in sufficiently general osition, one can make use of the following fact.
22.11. Theorem. Let L 1, L 2, L 3 and L 1, L 2, L 3 be six lines in sufficiently general osition. Let P ij be the oint of intersection of L i and L j. Then every cubic curve that asses through P ij for all i, j excet i = j = 3 also asses through P 33. Proof. Here is a sketch. To ass through a given oint P imoses a linear condition on the coefficients of a general homogeneous cubic olynomial in three variables. The sufficiently general osition ensures that the eight conditions we obtain are linearly indeendent (this is essentially the definition of sufficiently general osition ). Since a general homogeneous olynomial of degree 3 has ten coefficients, the sace of cubics assing through the eight oints has dimension two. Now (writing L 1 etc. for the linear form defining L 1 etc.) L 1 L 2 L 3 and L 1L 2L 3 are two such cubics, and they are linearly indeendent. So any F giving a curve assing through the eight oints is a linear combination of these two cubics, and therefore F vanishes also on P 33. Now we take L 1 = L(P, Q), L 2 = L(O, Q R), L 3 = L(P +Q, R) and L 1 = L(Q, R), L 2 = L(O, P Q), L 3 = L(P, Q + R), where L(X, Y ) denotes the line through X and Y. Then by the theorem above, we find that E asses through P 33, which is therefore the third oint of intersection of E with L 3 and with L 3. This means that (P + Q) R = P (Q + R), as was to be shown. Note that this roof does not work as described if some of the nine oints coincide, and one either has to consider a whole lot of secial cases searately, or use some sort of continuity argument to get rid of them. 22.12. Definition. Let E be an ellitic curve given by a short Weierstrass Equation y 2 = x 3 + A x + B. Then the number (E) = 16(4A 3 + 27B 2 ) is called the discriminant of E. 22.13. Remark. The number 4A 3 27B 2 is the discriminant of the olynomial f(x) = x 3 + Ax + B, which is defined to be disc(f) = (α β) 2 (β γ 2 )(γ α) 2, where α, β, γ are the three roots of f. Therefore the discriminant vanishes if and only if f has multile roots. The strange factor 16 makes things work in characteristic 2: one can define (E) for long Weierstrass Equations E : y 2 + a 1 xy + a 3 y = x 3 + a 2 x 2 + a 4 x + a 6 by formally transforming it into a short one (using a substitution x x + α, y y + βx + γ) and then taking of the result. With the factor 16, (E) is a olynomial in the coefficients a 1,..., a 6, with integral coefficients, and vanishes if and only if there is a singularity, also when the characteristic is 2 or 3. As an examle, consider y 2 y = x 3 x. What would be the discriminant? We comlete the square on the left and obtain (y 1/2) 2 = x 3 x + 1/4. Relacing y 1/2 by a new y, we have a short Weierstrass Equation, and its discriminant is 16(4 ( 1) 3 + 27 (1/4) 2 ) = 37. Since this is not divisible by 2, the original equation defines an ellitic curve over any field of characteristic 2 (in fact, any field of characteristic different from 37). 57
58 22.14. The rational torsion grou. In an abelian grou, the subset of elements of finite order forms a subgrou. Let E be an ellitic curve, then a oint P on E is called a torsion oint, if np = O for some n 1. The smallest such n is then the order of P (as usual for grou elements). When G is an abelian grou, we write G[n] = {g G : ng = 0} for the n-torsion subgrou of elements killed by n and G tors = n 1 G[n] for the torsion subgrou of G. What can the torsion subgrou of E(Q) look like? It is a fact that as grous, E(C) = (R/Z) 2 and E(R) = R/Z or Z/2Z R/Z (the first when x 3 + Ax + B has only one real root, the second when it has three). This imlies that any finite grou of rational torsion oints must be of the form Z/nZ or Z/2Z Z/2nZ. The following result tells us that E(Q) tors is finite and therefore must be of one of these tyes. 22.15. Theorem (Nagell-Lutz). Let E be given by y 2 = x 3 + A x + B, where A and B are integers. Then for every torsion oint P = (ξ, η) E(Q) \ {O}: (1) ξ and η are integers. (2) η = 0 or η 2 divides 4A 3 + 27B 2. In articular, E(Q) tors is finite: there are only finitely many ossible y-coordinates, and for each y-coordinate, there are at most three corresonding x-coordinates. Furthermore, this result rovides us with an algorithm to find all the torsion oints: first, we find all integral oints (x, y) on E such that y 2 divides 4A 3 + 27B 2. Then for each such oint P, we comute its multiles np, n = 2, 3,..., until we either obtain O (then the oint is a torsion oint, and we will also have found its order), or we obtain a oint that does not belong to the set of oints we have determined in the first ste (then P must have infinite order). Proof. (Sketch) We begin with the easy art, and this is that the first statement imlies the second. Assume P is a torsion oint such that 2P O (then P = (ξ, η) with η 0). We have to show that η 2 divides 4A 3 + 27B 2. Now observe that 2P is also a torsion oint (and O), so by the first statement, its x-coordinate ξ is integral. It is easy to comute ξ = ξ4 2Aξ 2 8Bξ + A 2 4(ξ 3 + Aξ + B) Consider the following trivially verified equality: (3x 2 +4A)(x 4 2Ax 2 8Bx+A 2 ) (3x 3 5Ax 27B)(x 3 +Ax+B) = 4A 3 +27B 2 We lug in ξ for x and divide by η 2 = ξ 3 + Aξ + B 0 to get 4A 3 + 27B 2 η 2 = 4(3ξ 2 + 4A)ξ (3ξ 3 5Aξ 27B) Z. It is quite a bit harder to rove the first art. First observe the following. A oint P = (ξ, η) E(Q) \ {O} has the form ( r P = t, s ) where r, s, t Z with r t, s t. 2 t 3.
Proof. We look at one rime at a time and have to show that v (ξ) < 0 v (η) < 0 v (ξ) = 2ν, v (η) = 3ν with ν > 0. Now if v (ξ) < 0, then 2v (η) = v (ξ 3 + Aξ + B) = 3v (ξ) < 0, so v (η) < 0 and 2v (η) = 3v (ξ), which imlies the last statement. If v (ξ) 0, then v (η) 0, so v (η) < 0 imlies v (ξ) < 0. This comletes the roof. The basic idea is now this. Suose that P = (ξ, η) is not integral. Then ξ = r/t 2, η = s/t 3, and t > 1, so there is some rime dividing t. Let ν = v (t). Then I claim that P = (r /t 2, s /t 3 ) with v (t ) = ν + 1. This imlies that the denominators (of the x-coordinates, say) of the oints P, P, 2 P,... are all distinct, hence the oints are all distinct, hence P cannot have finite order (because then it would only have finitely many distinct multiles). The idea for roving the claim is that in the -adic metric, a large ν in the above means that ξ and η are large, and hence P is close to O. Then the claim says that multilying P by moves it closer to O. For a oint P = (ξ, η) E(Q) such that 2P O, we define w(p ) = ξ/η. If v (ξ) = 2ν < 0 as above, then v (w(p )) = ν. It is now ossible (and not very hard) to show that if P 1, P 2 E(Q) are oints with dividing the denominators of their x-coordinates and such that v (w(p j )) ν (j = 1, 2), then w(p 1 ) + w(p 2 ) w(p 1 + P 2 ) mod 5ν. By induction, this imlies, for w(p ) = ν > 0, and from this we get as claimed above. w(mp ) mw(p ) mod 5ν, v (w( n P )) = v (w(p )) + n = ν + n, 22.16. Examles. Consider the curve E : y 2 = x 3 + 1. We find 4A 3 + 27B 2 = 27. Hence all the torsion oints P O have y(p ) {0, ±1, ±3}. For y = 0, we find x = 1; this gives a oint ( 1, 0) of order 2. For y = ±1, we find x = 0, and (0, ±1) is a trile intersection oint of E with the line y = ±1. This imlies that these oints have order 3. The sum ( 1, 0) + (0, 1) must then have order 6, and indeed, for y = 3, we find x = 2 and the oints (2, ±3) of order 6. Note that in general, we have to test the integral oints we find whether they really are torsion oints. For examle, on y 2 = x 3 + 3x, there is the integral oint P = (1, 2) the square of whose y-coordinate divides 4A 3 + 27B 2 = 108. We find 2P = (1/4, 7/8), therefore P cannot be a torsion oint (and this roves that there are infinitely many rational oints on that curve). Let us see what the torsion subgrou is in this examle. The other ossibilities for the y-coordinate are 0, ±1, ±3, ±6. We find the oint (0, 0) of order 2, but no other integral oints. Having roved that for each individual ellitic curve, the rational torsion subgrou is finite, one can ask whether the order of this grou is uniformly bounded (the alternative would be that the grou can be arbitrarily large). The answer to this question is yes. 59
60 22.17. Theorem (Mazur). If E is an ellitic curve over Q, then the torsion subgrou E(Q) tors is one of the following 15 grous. Z/nZ for n = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12 or Z/2Z Z/2nZ for n = 1, 2, 3, 4. All of these grous occur (even in a one-arameter family of ellitic curves). The roof of this theorem is far beyond what we can do in this course. 22.18. Reduction mod. In general, if we have a Weierstrass equation with integral coefficients, it makes sense to ask whether we obtain an ellitic curve over the finite field F if we reduce the coefficients mod. Since the discriminant is a olynomial with integral coefficients in the coefficients of the Weierstrass equation, we obtain its value for the mod reduced equation by reducing the discriminant mod. This imlies that we get an ellitic curve E over F if and only if the discriminant is not divisible by. In this case, we say that is a rime of good reduction for our ellitic curve. In this case, we have the grou E(Q) on the one side and the finite grou E (F ) on the other side. Is there any relation between the two? Ideally, we would like to take a oint in E(Q) and reduce its coordinates mod to get a oint in E (F ). There may be a roblem, however, when divides the denominator of the coordinates. To see how we can avoid this roblem, let us consider reduction mod of oints in the rojective lane. 22.19. Lemma. There is a canonical ma ρ : P 2 (Q) P 2 (F ), which is defined as follows. Let P = (ξ : η : ζ) P 2 (Q). Then we can scale the coordinates to get P = (ξ : η : ζ ) with corime integers ξ, η, ζ. Then ρ (P ) = ( ξ : η : ζ ). It is easy to check that the ma is well-defined: the only ambiguity in the trile (ξ, η, ζ ) is a simultaneous sign change, which leads to the same oint ρ (P ). Also, since ξ, η, ζ are corime, at least one of ξ, η and ζ will be non-zero. We see that the use of rojective coordinates eliminates roblems with denominators; this is another advantage of the rojective lane over the affine lane. We can now restrict ρ to the oints on our ellitic curve. 22.20. Proosition. Let E be an ellitic curve over Q, given by an equation with integral coefficients, and let be a rime of good reduction for E (i.e., such that E ). Then ρ : E(Q) E (F ) is a grou homomorhism. It is clear that this is well-defined as a ma. In order to show that it is a grou homomorhism, one needs to check that the oints of intersection of E with a line are maed to the oints of intersection of E with a line, with multilicities behaving as they should. This is not hard. Let P = (r/t 2, s/t 3 ) be in the kernel of ρ. To see what this means, we first have to write P as a oint with rojective coordinates that are corime integers. We get P = (rt : s : t 3 ). This reduces to Ō = ( 0 : 1 : 0) if and only if t = 0, i.e., if and only if divides t. We see that ker ρ is the subgrou of E(Q) consisting of O and the oints whose coordinates have denominators divisible by.
22.21. Corollary. If E : y 2 = x 3 + Ax + B with A, B Z, and E, then the homomorhism ρ is injective on E(Q) tors. Proof. By the Nagell-Lutz Theorem 22.15, every oint P E(Q) tors \ {O} has integral coordinates, hence P / ker ρ. So ker ρ E(Q) tors = {O}. This can be used to bound the size of the torsion subgrou: ρ identifies it with a subgrou of E (F ); therefore #E(Q) tors must divide #E (F ). 22.22. Examle. Consider E : y 2 y = x 3 x. We saw earlier that E = 37. Counting oints, we find #E 2 (F 2 ) = 5 and #E 3 (F 3 ) = 7. So the order of E(Q) tors must divide both 5 and 7, therefore E(Q) tors = {O}. Integral Points. We know that there can be infinitely many rational oints on an ellitic curve. What about integral oints? 22.23. Examle. Consider E : y 2 = x 3 + 17. We find the following integral oints: ( 2, ±3), ( 1, ±4), (2, ±5), (4, ±9) (8, ±23), (43, ±282), (52, ±375), (5234, ±378661) The examle shows that there can be quite a number of integral oints, and it is clear that this number is not bounded: take any ellitic curve with infinitely many rational oints, then by scaling the variables suitably, we can make as many of them integral as we like. So the question remains whether any given ellitic curve can have infinitely many integral oints. 22.24. Theorem (Siegel). Let E be an ellitic curve over Q, given by an equation with integral coefficients. Then E has only finitely many integral oints. This is not easy to rove. Siegel s roof is not effective: it does not rovide a bound for the coordinates of the integral oints. It also does not lead to an algorithm that determines all integral oints. Note that the difficulty is not to find the oints, but to know when we have found all of them. What Siegel roves is roughly that when there is a very large integral oint, then there can be no integral oint that is still very much larger. This leads to a contradiction when we assume that there are infinitely many integral oints. But it does not tell us that the list we have roduced is comlete (unless we really find a very large oint). Later, Baker found an effective bound (using his bounds on linear forms in logarithms of algebraic numbers); however this bound is really huge (something like doubly exonential in the coefficients). So this is a big theoretical imrovement, but it does not lead to a ractical algorithm. However, if we know generators of the grou E(Q), then there is a method that, given a bound, roduces a new bound that usually is smaller. Iterating this, one arrives at a bound that is manageable for most reasonable examle cases. So in ractice, we usually can find the set of integral oints. For examle, it can be shown that the oints we listed above in Examle 22.23 are all the integral oints on y 2 = x 3 + 17. The Grou of Rational Points. The last result I would like to discuss in this section is the fundamental structure theorem for the grou of rational oints on an ellitic curve. 61
62 22.25. Theorem (Mordell-Weil). If E is an ellitic curve over Q, then the grou E(Q) is a finitely generated abelian grou. The result as stated was roved by Mordell in a aer from 1922. Weil later generalized it to higher-dimensional abelian varieties (one-dimensional abelian varieties are the same as ellitic curves) and arbitrary number fields (i.e., finite field extensions of Q) instead of Q. By the general result on the structure of finitely generated abelian grous, this imlies that as abelian grous, E(Q) = E(Q) tors Z r for some r 0, which is called the (Mordell-Weil) rank of E(Q). It is an oen roblem (but widely believed to be true) whether r can be arbitrarily large. The latest record examle I know of had r 24, but nobody so far was able to roduce a sequence of ellitic curves whose ranks tend to infinity. The next question is if and how we can actually determine r for a given ellitic curve. This is also an oen roblem: there is no method for which one could rove that it determines the rank in all cases. There are, however, methods that work in ractice for more or less all examles (with reasonably-sized coefficients). If one could rove another standard conjecture ( the Shafarevich-Tate grou of E is finite ), then (at least in rincile) these methods would find the rank eventually. Now let me briefly indicate how the theorem is roved. The first ste is to rove the following result (which clearly must hold if E(Q) is finitely generated). 22.26. Theorem (Weak Mordell-Weil Theorem). The grou E(Q)/2E(Q) is finite. Proof. (Sketch) I will give the idea of the roof in a secial case, namely that all the oints of order 2 on E are rational. Then E has an equation of the form E : y 2 = (x a)(x b)(x c) with integers a, b, c, and the three oints of order 2 are (a, 0), (b, 0) and (c, 0). Note that a, b, c are airwise distinct. The idea of the roof is to embed E(Q)/2E(Q) into a grou which can be shown to be finite. To do this, we define a ma ϕ : E Q /(Q ) 2 Q /(Q ) 2 Q /(Q ) 2 O (1, 1, 1) (ξ, η) (ξ a, ξ b, ξ c) if ξ a, b, c (a, 0) ((a b)(a c), a b, a c) (b, 0) (b a, (b a)(b c), b c) (c, 0) (c a, c b, (c a)(c b)) (the values given are reresentatives of the classes mod (Q ) 2 ). Now I make a number of claims. (1) If ϕ(p ) = (α, β, γ), then αβγ is a square (i.e., αβγ = 1 in Q /(Q ) 2 ). (2) ϕ is a grou homomorhism. (3) ker ϕ = 2E(Q). (4) ϕ(e(q)) H H H, where H is the subgrou of Q /(Q ) 2 that is generated by the classes of 1, 2, and the rime numbers dividing (b a)(c b)(a c).
Note that H is finite and hence so is H H H. Statements (2) and (3) then show that ϕ induces an injection of E(Q)/2E(Q) into the finite grou H H H, and the statement of the theorem follows (in the secial case considered). The roofs of these claims are not very hard, but too lengthy to do them here. In the general case (with non-rational 2-torsion oints), one still uses the same idea, but one has to work with the number fields obtained by adjoining roots of x 3 + Ax + B to Q. This requires some knowledge of Algebraic Number Theory (which studies such fields). This roof generalizes to the case that E is an ellitic curve over any number field. 22.27. Remark. (1) Since by the first claim in the roof above, the roduct of the three comonents of ϕ(p ) is always 1 (in Q /(Q ) 2 ), the first two comonents determine the last one uniquely. This imlies that we even obtain an injection of E(Q)/2E(Q) into H H. (2) If r is the rank of E(Q), then E(Q) = T Z r with a finite grou T = E(Q) tors. Then E(Q)/2E(Q) = T/2T (Z/2Z) r = T [2] (Z/2Z) r, where T [2] = E(Q)[2] = {P E(Q) : 2P = O} is the 2-torsion subgrou of T (and of E(Q)). Hence #(E(Q)/2E(Q)) = #T [2] 2 r = 2 r+2 (the last equality is only valid in the secial case considered in the roof). This then imlies that r 2 + 2#{ : odd, (a b)(b c)(c a)} = 2 + 2s (note that #H = 2 2+s and 2 2+r #H 2 ). Exercise. If T is a finite abelian grou and is a rime number, then #(T/T ) = #T [] (which imlies that there is a (non-canonical) isomorhism between T/T and T []). (3) By using local information (signs or -adic considerations for the bad rimes ), it is ossible to restrict the image of ϕ further. For examle, if a < b < c, then the first comonent of ϕ(p ) is always ositive, which means that the image of ϕ is contained in a subgrou of index 2 of H H. This imroves the bound given above to 22.28. Examle. Consider r 1 + 2s. E : y 2 = x 3 x = (x + 1)x(x 1). The only rime dividing one of the differences of the roots of the right hand side is 2. Therefore, H in the roof above is generated by (the classes of) 1 and 2, and s above is zero. From Remark 22.27 above, we see that the image of ϕ is contained in {(α, β, γ) 1, 2 3 : αβγ = 1, α > 0}, a grou of order 8, and the rank of E(Q) is bounded by 1. Now we consider to what extent 2 can show u in α, β and γ. One can check (this is art of the argument in the roof of claim (4) in the roof of the weak Mordell-Weil Theorem 22.26) that oints that have even denominators in their x-coordinate will ma to elements 63
64 that do not involve 2 (the 2-adic valuations are even). Otherwise, x(p ) is either even (even numerator) or odd (odd numerator). In the first case, x(p )±1 are both odd, hence v 2 (α) = v 2 (γ) = 0, which imlies that v 2 (β) is even (by claim (1)), hence β does not involve 2. In the second case, v 2 (β) = v 2 (x(p )) = 0. We see that β can only be ±1, hence ϕ(e(q)) {(α, β, αβ) : α 2, β 1 }. This grou has order 4. But we know ϕ(( 1, 0)) = (2, 1, 2), ϕ((0, 0)) = (1, 1, 1), ϕ((1, 0)) = (2, 1, 2). This means that we have equality above, and the rank is zero, so E(Q) is finite. (Since #ϕ(e(q)) #E(Q)[2], we know that we must have equality even without comuting images of oints!) In order to determine E(Q) comletely, we note that all ossibly existing additional oints are torsion oints and therefore must be integral, with the square of the y-coordinate dividing 4A 3 + 27B 2 = 4, see the Nagell-Lutz Theorem 22.15. But neither x 3 x 1 nor x 3 x 4 have integral roots. Hence we can conclude that E(Q) = {O, ( 1, 0), (0, 0), (1, 0)}. To conclude the roof of the Mordell-Weil Theorem, we need to establish a way of measuring the size of a oint in E(Q). Then we can aly the following result. 22.29. Lemma. Suose that G is an abelian grou and that h : G R + is a ma satisfying the following assumtions. (1) G/2G is finite. (2) For each Q G, there is some C Q 0 such that h(p + Q) 2 h(p ) + C Q for all P G. (3) There is a constant C 0 such that h(2p ) 4 h(p ) C for all P G. (4) For every B > 0, the set {P G : h(p ) B} is finite. Then G is finitely generated. Proof. By the first assumtion, we can ick a finite set S G of coset reresentatives of 2G. Let D = max{c Q : Q S}. We show that G is in fact generated by S, together with all elements P such that h(p ) (C + D)/2. Call this set T. By the last assumtion, this generating set T is finite, which roves the lemma. Assume the claim is false. Then there is some P G such that P is not in the subgrou generated by T. By the last assumtion, we can asume that h(p ) is minimal among all such elements. Trivially, we must have h(p ) > (C + D)/2. Now, there is some Q S such that P Q = 2P 2G. Then we have h(p ) (h(2p ) + C)/4 (2h(P ) + C Q + C)/4 (2h(P ) + C + D)/4 < h(p ). By our choice of P, P is in the subgrou generated by T, but then P = Q + 2P must be in there as well, a contradiction.
22.30. The height. We have roved so far that G = E(Q) satisfies the first assumtion (at least when all the 2-torsion oints are rational). So it remains to define a suitable ma h in order to finish the roof of the Mordell-Weil Theorem. Let P = (r/t 2, s/t 3 ) E(Q), where r t, s t. Then we define h(p ) = log max{ r, t 2 } and h(o) = 0. (This is called the logarithmic naive height on E(Q).) We have to show that this ma h satisfies the asumtions in Lemma 22.29. The last one is the easiest one to see: When h(p ) is bounded, then numerator and denominator of x(p ) are bounded (by e h(p ) ), so there are only finitely many ossible x-coordinates. But for each x-coordinate, there are at most two oints in E(Q), hence any set of oints of bounded height is finite. 22.31. Lemma. There is a constant C such that for all P, Q E(Q), h(p + Q) + h(p Q) 2 h(p ) 2 h(q) C. 65 We ski the roof. Since h(p Q) 0, this rovides us with the middle two assumtions in Lemma 22.29. Therefore, all the assumtions of Lemma 22.29 are satisfied. Hence E(Q) is finitely generated. 22.32. Discussion. How can we try to determine E(Q)? The first ste is to bound the order of E(Q)/2E(Q) as tightly as ossible, by embedding it into a finite grou like H H in the roof of Thm. 22.26 and making use of all the restrictions we can derive from considerations at bad rimes, comare Examle 22.28. Then we hoe to find sufficiently many oints in E(Q) to show that our bound is actually tight. If we are successful, then we know the rank of E(Q). What is more, we also know generators of a subgrou of finite odd index (by taking a set of oints whose images under ϕ generate the image of ϕ). We can then use refinements of the height estimates used in the lemma above to find actual generators of the free art E(Q). The torsion subgrou can be dealt with using the Nagell-Lutz result 22.15. If we are not successful in establishing that the bound on E(Q)/2E(Q) is tight, then this can be for two reasons. It is ossible that the bound is tight, but that we just have not found some of the reimages (usually this haens because these oints are simly too large to be found by search). But it is also ossible that the bound is not tight. To make rogress in this situation, one can write down equations whose solutions arametrize the oints in the reimage under ϕ of a given element of the grou bounding the image of ϕ. Then one can search for solutions of these equations; the advantage here is that these solutions are (usually) smaller than the oints in E(Q) they give rise to, hence they are found more easily. This often hels resolve the first case (when the oints are too large). It does not hel in the second case, when there are no solutions to be found for some of the equations. In this case, one can try to iterate the rocedure that roduced these equations (also known as first descents ) to roduce so-called second descents. Sometimes, this enables one to deduce that a first descent does not have a solution (because it does not roduce any second descents). In other cases, one can use the second descents (the solutions of which are again smaller) to roduce solutions to a first descent.
66 An alternative aroach is to lay a similar game with E(Q)/3E(Q) (or E(Q)/5E(Q) or... ). Working out methods for doing this or second (and third,... ) descents is the subject of current active research. 23. Primes in arithmetic rogressions Euclid showed that there are infinitely many rimes. His argument can be refined to give more information, for instance: 23.1. Theorem. There are infinitely many rimes congruent to 3 mod 4. Proof. Suose not, and so suose that 1,..., k is the comlete list of rimes congruent to 3 mod 4. Form their roduct; if that is 1 mod 4, add 2. Otherwise add 4. The resulting number n is congruent to 3 mod 4, and is not divisible by any i. Hence all rime factors of n must be 1 mod 4, which imlies n 1 mod 4. This contradiction comletes the roof. To show that there are infinitely many rimes 1 mod 4, one can use that an odd rime is 1 mod 4 if and only if 1 is a quadratic residue mod. So, taking P = 4( 1 2 k ) 2 + 1, P 5 is not divisible by any of the j, and all its rime divisors are 1 mod 4. By the same kind of argument as above, this imlies that there are infinitely many rimes 1 mod 4. Exercise: Prove in a similar sirit that there are infinitely many rimes 1 mod 2 n, for every n 1. It is true, however, that all residue classes contain infinitely many rimes, excet when there is an obvious reason why not. 23.2. Theorem (Dirichlet). Suose N and a are corime integers. Then there are infinitely many rimes congruent to a modulo N. Dirichlet roved this in the 1830s or so, and the rest of this section discusses his roof. Dirichlet s aroach grows out of an exotic roof that there are infinitely many rimes given by Euler a century earlier. Consider the function ζ(s) := defined on the set of comlex numbers s with Re s > 1. This is the Riemann zeta function, as it has been known since Riemann s memoir from 1860, in which Riemann outlined a strategy for roving the rime number theorem, which redicts how frequently rime numbers occur among ositive integers, using comlex analytic roerties of ζ(s). That is a story for another section. Note that for Re s > 1, the series defining ζ(s) converges absolutely. Now, observe the formal identity ( 1 1 ) 1 = s rime rime n=1 1 n s ( 1 + 1 + 1 ) s +... = 2s n=1 1 n s, which exresses the fact that each ositive integer n can be written as a roduct of rimes in exactly one way. For each Re s > 1, all the sums and roducts converge absolutely, so one may change the order of the terms, and therefore both sides
converge to the same value. (In fact, both sides define the same analytic function on Re s > 1.) Recall that 1 n=1 values of s, all the terms in n=1 lim diverges. Equivalently, lim n s 1 + ζ(s) = +, since for real 1 are ositive. Hence, n s ( 1 1 ) 1 = +, s s 1 + rime which imlies there must be infinitely many rimes. This was essentially Euler s roof. Of course, one can also get some quantitative information out of it. One sees that ( 1 rime 1 ) 1 must diverge, which imlies by a calculation that 1 rime must diverge (see below). This would not be the case if, for instance, the rimes were distributed as thinly as the squares. In fact, since 1 n=1 n 1+ɛ converges for any fixed real number ɛ > 0, we must have #{rime < N} > N 1 ɛ, for infinitely many different integers N. We will learn more about the distribution of rimes in the next section. The Strategy. Let us now discuss the strategy for the roof of the general statement in Thm. 23.2. The roof mentioned above that there are infintely many can be hrased like this. Consider log ζ(s) = log = 1 ( ) = log 1 1 s ( 1 + 1 ) s 2 +... = 2s where f(s) remains bounded as s 1 + : f(s) = 1 k 1 1 1 ks 2 2 1 1/ k=2 1 ( 1 1 s ) 1 s + f(s) 1 1 2 n = π2 2 6 Now, as s 1 +, ζ(s), and therefore s as well, showing that there are infintely many rimes (and even that 1/ diverges). Now the idea is to show in a similar way that s as s 1 +. a mod N However, there is no simle way to set this u directly, using some modification of ζ(s), so we have to use a slight detour. We basically only can hoe to rove something about functions like ζ(s) if they involve all the rimes. We also want these functions to have a similar roduct structure (so that we can take logarithms easily). This leads to the following notion. 23.3. Definition. A Dirichlet character mod N is a function χ : Z C with the following roerties. (1) χ(n) only deends on the residue class of n mod N. (2) χ(n) 0 if and only if n N. (3) χ(mn) = χ(m)χ(n) for all m, n Z. n=1 67
68 The L-series associated to χ is defined for s > 1 by χ(n) L(χ, s) = = ( 1 χ() ) 1. n s s n=1 (The second equality is seen in the same way as for ζ(s), using the multilicativity of χ. A roduct reresentation of this tye is called an Euler roduct.) 23.4. Examle. For N = 4, a Dirichlet character χ mod 4 is determined by its values at 1 and 3 (the values at even numbers must be zero). The multilicativity forces χ(1) = 1. Since 3 2 1 mod 4, we must have χ(3) 2 = χ(1) = 1, so χ(3) = ±1. One checks easily that both choices give a Dirichlet character mod 4. Call them χ 0 and χ 1, resectively. Then L(χ 0, s) = 1 + 1 3 + 1 s 5 + = (1 1 ) 1 = (1 1 ) ζ(s) s s 2 s 4 and L(χ 1, s) = 1 1 3 s + 1 5 s 1 7 s +.... Note that (by the alternating series criterion) L(χ 1, s) converges for real s > 0 (uniformly for s δ > 0), and L(χ 1, 1) = arctan 1 = π/4 0. It is a fact (which we will rove soon) that 1 φ(n) χ(a)χ(n) = χ { 1 if a n mod N 0 else (where χ runs through all the Dirichlet characters mod N.) This means that we can take suitable linear combinations of log L(χ, s) in order to single out the residue class we are interested in. We have log L(χ, s) = χ() + f s χ (s) as before, where again f χ remains bounded as s 1 +. So 1 χ(a) log L(χ, s) = 1 φ(n) + f a(s) s χ a mod N with f a bounded as s 1 +. In order to finish the roof, we have to show that the left hand side tends to infinity as s 1 +. 23.5. Examle. For N = 4 again, we find 1( log L(χ0, s) + log L(χ 1, s) ) = 2 1 mod 4 1( log L(χ0, s) log L(χ 1, s) ) = 2 3 mod 4 s + bounded s + bounded We have seen that L(χ 1, s) is a continuous function for real s > 0 that does not vanish for s 1. This imlies that log L(χ 1, s) stays bounded as s 1 +. Since L(χ 0, s) is essentially ζ(s), log L(χ 0, s) tends to + as s 1 +. This roves again that there are infinitely many rimes 1 mod 4 and infinitely many rimes 3 mod 4.
Like χ 0 in the examle above, there is always one secial Dirichlet character mod N, the so-called trivial Dirichlet character χ 0, which takes the value 1 on all numbers corime with N (and the value 0 otherwise). It is easy to see that its L-series is, u to a simle factor, ζ(s), and so log L(χ 0, s) as s 1 +. It remains to rove that the other summands cannot lead to any cancellation. This is done by showing that (1) the series defining L(χ, s) converges for s > 0, hence the function is defined (and continuous) there, and (2) L(χ, 1) 0. These statements imly that log L(χ, s) log L(χ, 1) stays bounded as s 1 + for χ χ 0, concluding the roof. Now we have to fill in the various details. Characters. Let us first deal with the Dirichlet characters. There is a general notion of characters, as follows. 23.6. Definition. Let G be a grou. A character of G is a grou homomorhism χ : G C. Let Ĝ be the set of all characters of G; then Ĝ is an abelian grou (the character grou of G) under oint-wise multilication: (χψ)(g) = χ(g)ψ(g). 23.7. Remark. There is a bijection between Dirichlet characters mod N and the character grou of (Z/NZ), as follows. If χ is a Dirichlet character mod N, then ψ(ā) := χ(a) defines a character on (Z/NZ) (well-defined since χ(a) only deends on ā, and for ā (Z/NZ), χ(a) 0). If ψ is a character of (Z/NZ), then χ(a) = ψ(ā) for a N, χ(a) = 0 otherwise, defines a Dirichlet character mod N. Now for finite abelian grous (like (Z/NZ) ), the character grou behaves articularly nicely. 23.8. Proosition. If G is a finite abelian grou, then its character grou Ĝ is isomorhic to G. Proof. Assume first that G is cyclic of order n, and ick a generator g. Then for any character χ, we must have χ(g k ) = χ(g) k ; therefore, χ is comletely determined by the value χ(g). Now there is only one relation for g G, namely g n = 1. Therefore, χ(g) n = 1 as well, and every choice of χ(g) satisfying this will define a character. This shows that µ n Ĝ, ζ (gk ζ k ) sets u an isomorhism of Ĝ with the grou µ n of nth roots of unity, which is a cyclic grou of order n, hence isomorhic to G. (Note that the isomorhism G = Ĝ we construct deends on choices of generators of G and of µ n: it is not canonical.) Now we rove that Ĝ H = Ĝ Ĥ. The isomorhism is given as follows. Let χ Ĝ and ψ Ĥ. then φ(g, h) = χ(g)ψ(h) defines a character of G H. Conversely, if φ is a character of G H, then φ G and φ H are characters of G 69
70 and H, resectively, and one easily checks that the two mas are inverses of each other. (Note that this isomorhism is canonical; it does not deend on choices.) Finally, we use that every finite abelian grou is a direct roduct of cyclic grous: G = G 1 G k = Ĝ 1 Ĝk = (G 1 G k )ˆ= Ĝ For examle, this imlies that there are exactly #(Z/NZ) = φ(n) distinct Dirichlet characters mod N. Next we want to state and rove the orthogonality relations we need for the roof of Dirichlet s Theorem 23.2. 23.9. Lemma. Let G be a finite abelian grou and 1 g G. Then there is a character χ Ĝ such that χ(g) 1. Proof. First assume G is cyclic. Then we can take any character χ that generates Ĝ. (Comare the receding roof.) In the general case, write G = G 1 G k as a roduct of cyclic grous. Then g = (g 1,..., g k ) with at least one g j 1. Take a character χ j of G j such that χ j (g j ) 1, and define χ(h 1,..., h k ) = χ j (h j ). 23.10. Proosition. Let G be a finite abelian grou. Then (1) For all χ Ĝ, χ(g) = g G and therefore for all χ 1, χ 2 Ĝ (2) For all g G, 1 #G χ 1 (g)χ 2 (g) = g G χ(g) = χ Ĝ and therefore for all g 1, g 2 G 1 #G χ(g 1 )χ(g 2 ) = χ Ĝ { #G if χ = 1Ĝ 0 else { 1 if χ 1 = χ 2 0 else { #G if g = 1 G 0 else { 1 if g 1 = g 2 0 else Proof. If χ = 1Ĝ, then χ(g) = 1 for all g G, and the first claim is trivial. Otherwise, there is an h G such that χ(h) 1. Then (1 χ(h)) g χ(g) = g χ(g) g χ(hg) = g χ(g) g χ(g) = 0, and so g χ(g) = 0 as well (since 1 χ(h) 0). To get the assertion on χ 1 and χ 2, aly the result to χ = χ 1 χ 2 = χ 1 1 χ 2.
If g = 1 G, then χ(g) = 1 for all χ Ĝ, and the second claim is trivial (using #Ĝ = #G). Otherwise, by Lemma 23.9, there is a ψ Ĝ such that ψ(g) 1. Then (1 ψ(g)) χ(g) = χ(g) (ψχ)(g) = χ(g) χ(g) = 0, χ χ χ χ χ 71 and so χ χ(g) = 0. To get the assertion on g 1 and g 2, aly this to g = g 1 1 g 2 and note that χ(g 1 1 ) = χ(g 1 ) 1 = χ(g 1 ). 23.11. Corollary. Alying the receding orthogonality relation to Dirichlet characters, we get for a N that { 1 1 if a n mod N χ(a)χ(n) = φ(n) 0 else χ where the sum is over the Dirichlet characters mod N. This is one of the ingredients needed in the roof of Dirichlet s Theorem. The other essential art of the roof is to deal with the behavior of the functions log L(χ, s) as s 1 +. Let us first consider the trivial character χ 0 (it is the Dirichlet character corresonding to 1 Z/NZ ). 23.12. Lemma. Let χ 0 be the trivial Dirichlet character mod N. Then for s > 1, L(χ 0, s) = (1 1 ) ζ(s) s N and therefore log L(χ 0, s) log ζ(s) = N log(1 s ) is bounded as s 1 +. Proof. Comare the formulas ζ(s) = (1 s ) 1 and L(χ 0, s) = (1 χ 0 () s ) 1 = N(1 s ) 1. Before we can roceed, we need a basic result on the convergence roerties of Dirichlet series n=1 a nn s. 23.13. Proosition. (1) Let (a n ), (b n ) be two sequences of comlex numbers, and let A n = n k=1 a k. Then N a n b n = n=1 N 1 n=1 A n (b n b n+1 ) + A N b N.
72 (2) Keeing the notations, assume that (A n ) is bounded and that (b n ) is a decreasing sequence of real numbers such that b n 0. Then n=1 a nb n converges, and a n b n = A n (b n b n+1 ). n=1 n=1 (3) Let f(s) = n=1 c nn s. If the series converges for s = s 0, then it converges uniformly for all s s 0 +δ, for any δ > 0, and defines an analytic function on s > s 0. Proof. (1) This is an easy exercise. (2) Let A n A for all n. Then for M < N, we have by (1) N n=m+1 a n b n = N 1 n=m A n (b n b n+1 ) + A N b N A M b M. Hence N N 1 a n b n A( (b n b n+1 ) + b M + b N ) = A(b M b N + b M + b N ) = 2Ab M 0 n=m+1 n=m as M, hence the sequence of artial sums is Cauchy. Since A N b N 0, we get the stated formula by taking limits in (1). (3) Set a n = c n n s 0 and b n = n s0 s. Then the assumtions in (2) are satisfied for s s 0 + δ (since A n converges to f(s 0 )). From the roof of (2), we get the uniform bound N c n n s 2AM δ, n=m+1 which shows uniform convergence. Now a uniformly convergent series of analytic functions converges to an analytic function (comare Introductory Comlex Analysis), and since we can take δ > 0 as small as we like, we get a function that is analytic on s > s 0. The following result tells us more recisely what the behavior of ζ(s) is near s = 1. 23.14. Lemma. There is an analytic function f(s) for s > 0 such that for s > 1, ζ(s) = 1 s 1 + f(s). We define ζ(s) for 0 < s < 1 by this formula. Proof. First note that F (s) = ( 1) n 1 n=1 n s = 1 1 2 s + 1 3 s 1 4 s +... defines an analytic function for s > 0: by art (2) in Pro. 23.13 (with a n = ( 1) n 1 and b n = n s ), the series converges for s > 0, and by art (3) it gives an analytic function there. Now for s > 1, (1 2 2 s ) ζ(s) = 1 + 1 2 s + 1 3 s + 1 4 s + 2 2 s 2 4 s = F (s).
73 Therefore, ζ(s) = 1 1 F (s) = 1 21 s s 1 = 1 s 1 s 1 F (s) 1 21 s ( F (1) log 2 + (s 1)f(s) ) = 1 s 1 + f(s) with an analytic function f(s); note that F (1) = 1 1/2+1/3 + = log 2. 23.15. Corollary. We have log ζ(s) = log Now we look at the L-series L(χ, s). 1 s 1 + bounded as s 1+. 23.16. Proosition. Let χ be a nontrivial Dirichlet character mod N. Then the series χ(n) L(χ, s) = n s converges for s > 0 and defines therefore an analytic function on this domain. Proof. We aly Pro. 23.13 again. Set a n = χ(n) and b n = n s. Then A n φ(n), since (k+1)n n=kn+1 χ(n) = 0 for all k. Hence the assumtions in art (2) are satisfied, and the argument roceeds as for F (s) in the roof of the lemma above. It remains to show that L(χ, 1) 0, for then log L(χ, s) will stay bounded as s 1 +. This is the hardest art in the roof of Dirichlet s Theorem 23.2. We will state an auxiliarly result first. 23.17. Theorem (Landau). Suose that a n 0 for all n and that the series f(s) = n=1 a nn s converges for some, but not for all s R. Let s 0 = inf{s R : the series converges}. Then f(s) cannot be continued to the left of s 0 as an analytic function. We will not rove this result here; it makes use of the fact that f(s) is even analytic for Re s > s 0. (Here is a sketch: Under the assumtion that f(s) is analytic on a neighborhood of s 0, the ower series exansion of f(s) around s 0 + 1 has radius of convergence > 1; convergence of this at s 0 δ then imlies convergence of the series definig f(s) there: a contradiction.) As an examle, consider ζ(s): here s 0 = 1, and ζ(s) has a ole at s = 1 (it can be extended to a meromorhic function on C with just this one ole, though). On the other hand, for log ζ(s) = 1 k, ks k=1 we have again s 0 = 1, but this time, there is no meromorhic continuation across s 0. As a last examle, consider L(χ, s) for a non-trivial Dirichlet character χ. Then s 0 = 0, but one can show that L(χ, s) extends to an entire function. So the conclusion of the theorem is false in this case, showing that the assumtion (a n 0) is essential. To finish the roof of Dirichlet s Theorem 23.2, we need one more lemma. n=1
74 23.18. Lemma. Let F (s) = χ L(χ, s), where the roduct is over all Dirichlet characters mod N. Then for s > 1, F (s) = n=1 a n n s, where a n 0 for all n and a n 1 for n = m φ(n) with some m 1, m N. Proof. We first need a formula about characters: Let G be a finite abelian grou, and lat g G be an element of order f. Then (1 χ(g)x) = (1 X f ) #G/f χ Ĝ as olynomials in X. To see this, we observe that Ĝ χ χ(g) µ f is a surjective grou homomorhism. If we let ζ = ex(2πi/f) be a generator of µ f, then it follows that f 1 (1 χ(g)x) = (1 ζ k X) #G/f = (1 X f ) #G/f. χ Ĝ k=0 Now we consider the Euler roduct of F (s) (everything converges absolutely for s > 1, justifying the rearrangements): F (s) = L(χ, s) = (1 χ() s ) 1 = ( ) 1 (1 χ() s ) χ χ χ and by the result above: = N (1 fs ) φ(n)/f = N(1 + fs + 2fs +... ) φ(n)/f In each factor of the last roduct, all terms have nonnegative coefficients, and all terms of the form c k kφ(n)s have coefficient 1. The claim follows by exanding the roduct. Now we can finish the roof. 23.19. Theorem. If χ is a non-trivial Dirichlet character mod N, then we have L(χ, 1) 0. Proof. Assume that L(χ, 1) = 0 for some (non-trivial) χ. Then the simle ole of L(χ 0, s) at s = 1 is canceled by the zero of L(χ, s) at s = 1; therefore the function F (s) = χ L(χ, s) is analytic for s > 0. But by the receding lemma, we have that F (s) = n=1 a nn s with a n 0. Furthermore, a m φ(n) 1 for all m N, and so the series does not converge for s = 1/φ(N): a n n 1/φ(N) (m φ(n) ) 1/φ(N) = 1 m = n=1 m 1,m N m 1,m N So by Landau s Theorem 23.17, F (s) has no analytic continuation across s 0, where 1/φ(N) s 0 1, a contradiction. So our assumtion that L(χ, 1) = 0 for some χ must be false.
75 24. The Prime Number Theorem The distribution of the rime numbers among the integers is a mystery that was (and is) studied by many mathematicians for a long time. Let π(x) (for x R) denote the number of rimes x. After extensive comutations of tables of rimes, Gauss conjectured around 1800 (and others, like Legendre, did the same at about the same time) that the density of rimes near x should be 1/ log x. More recisely, he conjectured that for large x, π(x) li(x) = li(2) + x 2 dt log t (here, li(2) is defined by the Cauchy rincial value of the integral: This notation means that Legendre s version was li(2) = lim ε 0 + 1 ε 0 dt log t + 2 1+ε π(x) lim x li(x) = 1. dt log t 1.045) π(x) x log x ; integrating by arts, it is easy to see that the two versions are equivalent. It took nearly 100 years before this statement, known as the Prime Number Theorem was finally roved. The most imortant inut came again from Riemann s memoir of 1860, where he introduced the study of the (now so-called) Riemann zeta function ζ(s) as a function of a comlex argument s. About ten years earlier, Chebyshev could at least rove that the order of magnitude was correct: there are constants 0 < c < C such that for x 2 (say), x c log x π(x) C x log x. The argument is rather elementary. First we define two more functions that are easier to deal with. In the following, always denotes a rime number. Let θ(x) = x log and (the first sum runs over airs (, k)) ψ(x) = log = log x log = Λ(n) = log k x x n x Here, the von Mangoldt function Λ is defined as Now we have that Λ( k ) = log, Λ(n) = 0 otherwise. θ(x) ψ(x) = x θ(x 1/k ). k=1 log x log π(x) log x. log Also, for ε > 0 small, θ(x) log (π(x) π(x 1 ε ))(1 ε) log x (1 ε) log x(π(x) x 1 ε ) x 1 ε < x
76 Now let us consider the rime factorization of the binomial coefficient ( 2n n ). 24.1. Lemma. We have θ(2n) θ(n) log ( ) 2n ψ(2n). n Proof. Let n < 2n be a rime. Then divides the numerator of ( ) 2n n = (2n)!/n! 2 once, but does not divide the denominator. Hence the binomial coefficient is divisible by n< 2n. This imlies the first inequality. For the second inequality, observe that ( ( ) 2n ) ( 2n v = n k k=1 n ) 2 k Since ( 2n n ) 4 n / πn by Stirling s formula, we get and θ(2 k ) = ψ(2n) 2n log 2 O(log n) k (θ(2 j ) θ(2 j 1 )) j=1 log(2n). log k 2 k log 2 2 k 2 log 2. These imly ψ(x) log 2 x O(log x) and θ(x) θ(2 log x/ log 2 ) 2 log 2 2 log x/ log 2 4 log 2 x. We already see that j=1 π(x) ψ(x) log x log 2 x log x O(1). If we set ε = 2 log log x/ log x above, we obtain π(x) θ(x) ( ( log log x )) x ( 1 + O 4 log 2 1 + O log x log x log x ( log log x log x )). However, in order to rove the Prime Number Theorem comletely, more is needed. The ground-breaking ideas were formulated by Riemann in his memoir Über die Anzahl der Primzahlen unter einer gegebenen Größe (On the number of rimes below a given quantity), relating the asymtotics of π(x) with analytic roerties of ζ(s) as a meromorhic function. However, it still took more than 30 years, until Hadamard and De la Vallée Poussin indeendently were able to carry through the rogram sketched by Riemann. 24.2. Lemma. We have ( ψ(x) lim x x π(x) ) = 0. x/ log x Proof. We have already seen that (1 ε) log x(π(x) x 1 ε ) θ(x) ψ(x) π(x) log x. We divide by x, set ε = 2 log log x/ log x and let x tend to infinity; the result follows.
To rove the rime number theorem, it therefore suffices to show that ψ(x) x, i.e., lim x ψ(x) x = 1. The following lemma shows why it is advantageous to work with ψ(x) instead of π(x). 24.3. Lemma. For Re s > 1, we have ζ (s) ζ(s) = n=1 Λ(n) n s. Proof. We comute the logarithmic derivative using the Euler roduct for ζ(s). (Note that for Re s > 1, everything converges absolutely and locally uniformly, hence all maniulations with the series below are justified.) ζ (s) ζ(s) = d ds log(1 s ) = s log 1 = s k=1 log ks = n Λ(n) n s Now it is easy to see that ζ(s) extends to a meromorhic function on Re s > 0 with only a simle ole at s = 1. (In fact, ζ(s) extends to a meromorhic function of all of C, with still only this one simle ole.) Hence ζ (s)/ζ(s) likewise is a meromorhic function there, with simle oles at the ole of ζ(s) (residue 1) and the zeros of ζ(s) (residue the order of the zero). If one would want to work directly with π(x), the corresonding function is s, which does not have nice roerties. One could try to use log ζ(s) = k=1 ks /k (the difference between π(x) and the corresonding function for log ζ(s) is negligible), but this function has logarithmic singularities at s = 1 and at the zeros of ζ(s), which comlicates things considerably. 24.4. Proosition (Perron s Formula and an Alication). (1) Let c > 0, y > 0. Then lim T 1 2πi c+it c it y s s ds = 0 if 0 < y < 1, 1 if y = 1, 2 1 if y > 1. (2) Let f(s) = n=1 a n n s, and suose that the series converges absolutely for s = c > 0. Then for x > 0, lim T 1 2πi c+it c it f(s) xs s ds = a n if x / Z, n x a n + 1a 2 x if x Z. Proof. I will not give the details of the roof. The idea for art (1) is to move the line segment [c it, c + it ] far off to the right (if 0 < y < 1) or to the left (if y > 1). In the first case, the integral around the rectangle one gets is zero by the residue theorem, and the integrals over the three sides different from [c it, c+it ] can be estimated to tend to zero (as the right hand vertical edge moves to infinity and T ). In the second case, the argument is similar, but here, the integral n<x 77
78 around the rectangle icks u the residue of y s /s at s = 0, which is 1. The case y = 1 is done by direct calculation. To rove (2), one alies (1); one has to justify the swaing of the sum with the integral and limit, but this can be done using the estimates one obtains in the roof of (1) and the assumtion that the series converges absolutely for s = c. lim T 1 2πi c+it c it f(s) xs s ds = lim T = = n=1 n=1 1 2πi c+it c it a n lim T a n n=1 a n (x/n) s s c+it ds 1 (x/n) s ds 2πi s c it 1 if n < x, 1 if n = x, 2 0 if n > x. 24.5. Corollary. Let c > 1. Then for x > 0, c+it { 1 ( lim ζ (s) x s ) ψ(x) if x / Z, ds = T 2πi ζ(s) s ψ(x) 1 Λ(x) if x Z. 2 c it Proof. Clear. So in order to rove the Prime Number Theorem ψ(x) x, we have to evaluate/estimate the integral. The idea is to move the line over which we integrate a little bit to the left (at least some art of it with bounded imaginary art). Then the integral will ick u the residue of ζ (s)x s /(ζ(s)s) at s = 1, which is x. But this will only work if we do not ick u any other oles. This in turn means that ζ(1 + it) 0 for all (real) t 0 (otherwise, there will be oles on Re s = 1, and we cannot move our integration contour across this line). 24.6. Lemma. ζ(s) does not vanish on Re s = 1. Proof. First we show that for σ > 1 and t R, we have ζ(σ) 3 ζ(σ + it) 4 ζ(σ + 2it) 1. To see this, we take the logarithm of the left hand side: log ζ(σ) 3 ζ(σ + it) 4 ζ(σ + 2it) = Re ( 3 log ζ(σ) + 4 log ζ(σ + it) + log ζ(σ + 2it) ) = = 0 k=1 k=1 since 3 + 4 cos α + cos 2α = 2(1 + cos α) 2 0. 1 k kσ Re(3 + 4e kti log + e 2kti log ) 1 (3 + 4 cos(kt log ) + cos(2kt log )) kkσ
Now assume that ζ(1 + it) = 0 for some t 0. Then ζ(σ + it)/(σ 1) remains bounded as σ 1 + ; also ζ(σ)(σ 1) is bounded as σ 1 +. Hence 1 ζ(σ) 3 ζ(σ+it) 4 ζ(σ+2it) = ( ζ(σ)(σ 1) ) ( 3 ζ(σ + it) ) 4ζ(σ+2it) (σ 1) 0 σ 1 as σ 1 +, a contradiction. To finish the roof of the Prime Number Theorem, one has to obtain a suitable zero-free region for ζ(s) to the left of Re s = 1 and a bound on ζ (s)/ζ(s) there. This is a bit technical (though not really hard), and we will omit the details here. What one gets fairly easily is that ζ(σ+it) does not vanish for σ > 1 C/ max{1, log t } for some constant C; this then translates into an error term in the Prime Number Theorem of the form ψ(x) = x + O(xe c log x ) or π(x) = li(x) + O(xe c log x ). (There are some imrovements on this, relacing the square root by a higher ower of log x, like (log x) 3/5 ε.) 24.7. Final Remarks. Here are some more facts about the Riemann zeta function ζ(s). (1) ζ(s) can be extended to a meromorhic function on C; its only ole is the simle ole at s = 1 with residue 1. (2) ζ(s) has simle zeros at negative even integers ( trivial zeros ); it also has infinitely many zeros in the critical stri 0 < Re s < 1. (3) ζ(s) satisfies a functional equation: define ( s ξ(s) = s(1 s)π s/2 Γ ζ(s) ; 2) then ξ(s) is an entire function satisfying ξ(1 s) = ξ(s). Its zeros coincide with the nontrivial zeros of ζ(s). (Γ(s) is the Gamma function; it can be defined for Re s > 0 by the integral Γ(s) = 0 e t t s 1 dt ; it satisfies the functional equation sγ(s) = Γ(s + 1), which can be used to define it as a meromorhic function on all of C. It has no zeros, and has simle oles at the nonositive integers. Also Γ(1) = 1 and therefore (by induction), Γ(n + 1) = n!.) (4) Riemann (in his 1860 memoir) stated it was likely that all the nontrivial zeros are actually on the critical line Re s = 1/2. This is the famous Riemann Hyothesis. Using this (and suitable estimates for ζ (s)/ζ(s) when Re s 0), one can continue moving the line of integration to the left and finally obtain the Exlicit Formula ψ(x) = x x ρ ρ log 2π log(1 x 2 ). ρ Here, ρ runs through the zeros of ζ(s) in the critical stri, with multilicities, and the sum has to be taken in ascending order of Im ρ. (Note that ζ (0)/ζ(0) = log 2π.) There is a similar formula for π(x). 79
80 From this exlicit formula or the idea of the roof of the Prime Number Theorem, one obtains the following. 24.8. Proosition. Let 1/2 β < 1. (1) If ζ(s) does not vanish for Re s > β, then ψ(x) = x + O(x β (log x) 2 ) and π(x) = li(x) + O(x β (log x) 2 ). (2) If ψ(x) = x + O(x β ) or π(x) = li(x) + O(x β ), then ζ(s) does not vanish for Re s > β. As a corollary, we see that the Riemann Hyothesis is equivalent to the urely number theoretic statement ψ(x) = x + O(x 1/2+ε ) for all ε > 0. In some sense, this exresses that the distribution of the rime numbers is as even as ossibly allowed by the fact that ζ(s) does have infinitely many zeros in the critical stri. Another view on this is the following. Let µ(n) be the Möbius function: { ( 1) r if n = 1... r is squarefree (r 0), µ(n) = 0 otherwise. Then it is easy to see that n=1 µ(n)n s = 1/ζ(s). Now, alying the above aroach to 1/ζ(s) instead of ζ (s)/ζ(s), we obtain that the Riemann Hyothesis is equivalent to µ(n) = O(x 1/2+ε ) for all ε > 0. n x This can be interreted as saying that the sequence of the nonzero values of µ(n) behaves statistically like an unbiased random walk. References [Bru] J. Brüdern: Einführung in die analytische Zahlentheorie. Sringer Verlag, 1995. [Cas] J.W.S. Cassels: Lectures on ellitic curves. LMS Student Texts 24, Cambridge University Press, 1991. [IrRo] K. Ireland, M. Rosen: A classical introduction to modern number theory. Sringer Graduate Texts in Mathematics 84, Sringer Verlag, 1982. [Nat] M.B. Nathanson: Elementary methods in number theory. Sringer Graduate Texts in Mathematics 195, Sringer Verlag, 2000. [Sil] Joseh H. Silverman: The Arithmetic of ellitic curves. Sringer Verlag, 1986. [Sti] Douglas R. Stinson: Crytograhy. Theory and ractice. 2 nd edition, Chaman & Hall/CRC, 2002.