ON EM-LIKE ALGORITHMS FOR MINIMUM DISTANCE ESTIMATION. P.P.B. Eggermont and V.N. LaRiccia University of Delaware

Size: px
Start display at page:

Download "ON EM-LIKE ALGORITHMS FOR MINIMUM DISTANCE ESTIMATION. P.P.B. Eggermont and V.N. LaRiccia University of Delaware"

Transcription

1 March 1998 ON EM-LIKE ALGORITHMS FOR MINIMUM DISTANCE ESTIMATION PPB Eggermont and VN LaRiccia University of Delaware Abstract We study minimum distance estimation problems related to maximum likelihood estimation in positron emission tomography (pet), which admit algorithms similar to the standard em algorithm for pet with the same type of monotonicity properties as does the em algorithm, see Vardi, Shepp, and Kaufman [25] We derive the algorithms via the maorizing function approach of De Pierro [11], as well as via the alternating proections approach of Csiszár and Tusnády [7], and prove the monotonicity properties of these algorithms The distances studied include the Hellinger distance and cross-entropy The Pearson s ϕ 2 distance fits in, but does not seem to enoy both monotonicity properties For nonnegatively constrained least squares problems the two approaches lead to different algorithms, both of which enoy the strong monotonicity properties Corresponding author: Paul Eggermont Department of Mathematical Sciences University of Delaware Newark, Delaware telephone : (302) fax : (302) eggermon@mathudeledu 1

2 1 Introduction In this paper we study various minimum distance estimation problems that are similar to maximum likelihood estimation for positron emission tomography and that admit minimization algorithms similar to the EM algorithm of Shepp and Vardi [23], with similar monotonicity properties The distances under discussion are mainly the Hellinger distance and Pearson s ϕ 2 distance The last one was recently studied by Mair, Rao and Anderson [19] We also discuss smoothed (roughness penalized) minimum distance estimation problems, and briefly discuss minimum cross-entropy and minimum Burg-entropy estimation problems The results are new for the minimum Hellinger distance estimation problem, as well as for the smoothed versions of Hellinger and Pearson s ϕ 2 problems In minimum Hellinger distance estimation one solves the problem (11) minimize H(b, Ax) def = n [Ax] i b i 2 subect to x 0 component wise, where A R n m is a nonnegative matrix, with coefficients a i and with columns sums equal to one : (12) n a i = 1, = 1, 2,, m, and b R n is a nonnegative data vector The Hellinger distance is closely related to both the Kullback-Leibler distance (13) KL(u, w) = n and Pearson s ϕ 2 distance u i log u i w i + w i u i, (14) P(u, w) = n u i w i 2 w i The problem (15) minimize KL(b, Ax) subect to x 0 is the maximum likelihood estimation problem familiar from astronomical image processing, Richardson [21], Lucy [18], and emission tomography, Rockmore and Macovski [22], Shepp and Vardi [23] There, the underlying model is that b 1, b 2,, b n 2

3 are independent Poisson random variables with means [Ax o ] i, i = 1, 2,, n Here x o is an unknown probability vector one wishes to estimate So x o satisfies (16) m x o, = 1 Since the number of parameters to be estimated is typically quite large, this problem behaves like a nonparametric estimation problem The Pearson s ϕ 2 distance arises from the normal approximation to the Poisson distribution, see Mair, Rao and Anderson [19] In this context, minimum Hellinger distance estimation is suggested by its role played in parametric estimation problems For parametric problems, minimum Hellinger distance estimation enoys optimality properties similar to maximum likelihood estimation, if the postulated model is in fact true Moreover, its robustness with respect to modeling errors is well documented, see, eg, Beran [1], Tamura and Boos [24], and references therein Here we concentrate on methods for solving (11), with special emphasis on EM-like algorithms with EMlike monotonicity properties In the process we point out other, similar minimization problems with similar algorithms Byrne [3] does more or less the same, but considers a quite different set of algorithms See also Byrne [4] The EM algorithm for solving (15) is, starting from any strictly positive vector x 1 (17) x k+1 = x k [ A T r k ], = 1, 2,, m, with ri k = b i/[ Ax k ] i (We abbreviate this as r k = b / Ax k ) The model (15) and the algorithm (17) was introduced by Richardson [21] and Lucy [18] in astronomical image processing, and by Shepp and Vardi [23] in positron emission tomography Vardi, Shepp, and Kaufman [25] derived the two wonderful monotonicity properties of the EM algorithm The first monotonicity property is that (18) KL(b, Ax k ) KL(b, Ax k+1 ) KL(x k+1, x k ), k 1, which says that the algorithm (17) decreases the negative log-likelihood KL(b, Ax) This is about the least one would expect of an algorithm for minimizing KL second one is quite unexpected If x is any solution of (15) then (19) KL(x, x k ) KL(x, x k+1 ) KL(b, Ax k ) KL(b, Ax k+1 ) In combination with (18) this says that the x k get closer to every solution of (15) (The everyday image is that the x k land on the solution set like a helicopter on an airfield, 3 The

4 rather than like a plane) The convergence of the algorithm (17) to a solution of (15) is an easy consequence, see, eg, Vardi, Shepp, and Kaufman [25] or Byrne [2] Vardi, Shepp, and Kaufman [25] modeled their proof of the monotonicity properties (18) and (19) on the alternating proection approach of Csiszár and Tusnády [7] There are two aspects to this geometric view The first one comprises the setting in which the alternating proection method may be formulated and in which it solves the original minimization problem if the algorithm in fact converges The second aspect is the proof of the convergence of the algorithm, which requires extra conditions on the obective function The Csiszár and Tusnády [7] approach applies in full to (15) with the resulting algorithm (17) and monotonicity properties (18) and (19) The approach applies only partially to minimizing the Pearson s ϕ 2 distance and Hellinger distance Mair, Rao and Anderson [19] showed that the Csiszár and Tusnády [7] approach applies to minimizing P(b, Ax), with the resulting algorithm (110) x k+1 = x k {[ A T r k ] } 1/2, where ri k = (b i / [Ax k ] i ) 2 Unfortunately, this is where it ends There is a first monotonicity property, of course, but a second monotonicity property analogous to (19) is not provided by the alternating proections approach Likewise, the Csiszár and Tusnády [7] approach applies to the minimum Hellinger distance estimation problem (11) with the resulting algorithm (111) x k+1 = x k {[ A T r k ] } 2, where now ri k = (b i / [Ax k ] i ) 1/2 The (dis)similarity with (110) is uncanny Unfortunately, here too a second monotonicity property is not provided However, there is a second approach to deriving these algorithms De Pierro [9], [11] used this approach both to derive algorithms for penalized versions of (15), and to show monotonicity properties This was based on his interpretation of the analytic proofs of the monotonicity properties (18) and (19) by Mülthei and Schorr [20] De Pierro [11] calls it the maorizing function approach, because it is based on the inequality (112) KL(b, Ax) KL(b, Ay) + Λ KL (x, y), with (113) Λ KL (x, y) = m y [ A T {b/ay} ] log y x + x y, 4

5 for nonnegative x, y R m Note that Λ KL (y, y) = 0 The EM algorithm now arises by minimizing Λ KL (x, x k ) over x We show in this paper that this approach extends to the algorithms (110) and (111) We prove the following (114) Theorem Let x 1 R m be strictly positive, and let x be any solution of (11) Then the sequence { x k } k generated by (111) satisfies H(b, Ax k ) H(b, Ax k+1 ) H(x k, x k+1 ), KL(x, x k ) KL(x, x k+1 ) 2 { H(b, Ax k+1 ) H(b, Ax ) } Again, the convergence of the algorithms (111) is an easy consequence An unexplained feature of the second monotonicity property is that the Kullback-Leibler distance pops up again For the algorithm (110) we are not so fortunate There is a first monotonicity property, Mair, Rao and Anderson [19], (115) P(b, Ax k ) P(b, Ax k+1 ) P(x k+1, x k ), but a second monotonicty property analogous to (19) remains elusive in this set-up as well At this point we cannot resist mentioning our smoothed EM algorithm Let S R m m be a symmetric (nonnegative) smoothing matrix with all columns sums equal to 1, and define the nonlinear smoother N (based on geometric averages) by (116) [ N x ] = exp( [ S{log f} ] ), = 1, 2,, m The smoothed version of the maximum likelihood estimation problem (15) is (117) n b i minimize b i log + [ Ax ] i b i [ AN x ] i subect to x 0 component wise, The problem (117) also admits an EM algorithm, viz (118) x k+1 = S { (N x k ) (A T r k ) }, with r k i = b i / [ AN x k ] i for all i, and (N x k ) (A T r k ) is the component wise product of the two vectors N x k and A T r k Moreover, the analogues of the monotonicity properties hold, see Eggermont [13], Eggermont and LaRiccia [14] The rather surprising 5

6 thing is that there is an analogue of this for (11) before, define the nonlinear smoother M by With the smoothing matrix as (119) [ M x ] = { [ S( x ) ] } 2, and consider the problem (120) minimize H(b, A, x) def = n subect to x 0 component wise The algorithm for (120) analogous to (118) is b i 2 b i [ AM x ] i + [ Ax ] i (121) x k+1 = S { Mxk (A T r k ) }2, and its monotonicity properties are stated in the following theorem (122) Theorem Let x 1 R m be strictly positive, and let x be any solution of (119) Then the sequence { x k } k generated by (120) satisfies H(b, A, x k ) H(b, A, x k+1 ) H(x k, x k+1 ), KL(x, x k ) KL(x, x k+1 ) 2 { H(b, A, x k+1 ) H(b, A, x ) } There is a similar algorithm with analogous monotonicity properties for the minimization problem (116) with the nonlinear smoother N replaced by the nonlinear smoother M, see Eggermont and LaRiccia [15] Finally, there is an analogous smoothed version with the analogous monotonicity properties for minimum Pearson s ϕ 2 estimation, see 5 (but no second monotonicity property) The proofs of all these monotonicity properties for these smoothed algorithms are substantially the same, but a unifying theory, say along the lines of Csiszár and Tusnády [7], has not been forthcoming Earlier on we mentioned the close connection between the Kullback-Leibler, Pearson s ϕ 2 and Hellinger distances This is further illustrated by considering the following two algorithms for solving (15) With Λ KL strictly positive vector x 1, let x k+1 be the solution to the maorizing function, and starting from a (122) minimize Λ KL (x, x k ) + P(x, x k ) subect to x 0 It turns out that the resulting algorithm is a multiplicatively relaxed version of the EM algorithm (17), viz (123) x k+1 = x k ( [ A T {b / Ax k } ] ) 1/2 6

7 Note the difference with algorithm (110)! The algorithm (123) has ust about the same monotonicity properties (18) and (19), see Iusem [17] The Hellinger analogue of (122) also works That is, if x k+1 is defined (recursively) as the solution to (124) minimize Λ KL (x, x k ) + H(x, x k ) subect to x 0, then (125) x k+1 = x k { } [ A T { b / Ax k } ], = 1, 2,, m, and this too is a multiplicatively and additively relaxed version of (17) and satisfies analogues of the two monotonicity properties We omit the details We emphasize again that these last two algorithms are merely stated to show the close interplay between the three distances under discussion In the next section we discuss the alternating proection method and point out some applications In 3 we discuss its application to minimum Hellinger distance estimation, and derive the algorithm In 4 and 5 we discuss the maorizing function approach to minimum Hellinger and minimum Pearson s ϕ 2 estimation problems, as well as to minimizing Burg-entropy In 6 we briefly discuss the maorizing function approach to nonnegatively constrained least squares estimation : in this case this leads to an algorithm different from the Csiszár and Tusnády [7] approach 2 Alternating proections onto closed convex subsets of R d In this section we discuss the alternating proection method of Csiszár and Tusnády [7], and give a slightly more general proof of the convergence However, the exposition follows quite closely that of Csiszár and Tusnády [7] Since proections onto closed convex sets may be thought of as being obtained as solutions of minimum distance problems, we begin by introducing suitable generalizations of (the square of) Euclidean distance Let b : domain b R d R { } be a proper convex, lower semi continuous function For simplicity we assume also that b is differentiable on its domain If b is not differentiable, then the notion of subgradients may be used, but this would cause technical complications On domain B = domain b domain b define (21) B(x, y) = b(x) b(y) b(y), x y, 7

8 where b denotes the gradient of b Note that B(x, y) 0 for all x, y, by the convexity of b To strengthen the interpretation of B(x, y) as distance squared we make the following assumptions (B1) (B2) B(x, y) is convex in x, y ointly, and strictly convex in x and in y separately B(x, y) is lower-semi-continuous in x, y ointly (B3) B(x, y) has bounded level sets for fixed x, and for fixed y (B4) If B(x n, y n ) 0, and {x n } n or {y n } n is bounded, then x n y n 0 (B5) If x o P, and x o y n 0, then B(x o, y n ) 0 These conditions are somewhat technical, but they are precisely what is needed later on An important feature is that we do not require symmetry of B(x, y) in x and y (22) Remark It is easily checked that B satisfies the above conditions when b is one of the following three examples : (a) b(x) = m x log x, x 0 ; (b) b(x) = m x2, x Rm ; (c) b(x) = m xp, x 0, where 1 < p < 2 It is not so clear whether there are other (interesting) examples (23) Remark It is likewise easily checked that the functions B given below are not of the form (21), but do satisfy (B1) through (B5) (a) B(x, y) = m x y 2, x, y 0 (b) B(x, y) = m x y 2 /y, y > 0, x 0 Since there is no symmetry, the function B(x, y) gives rise to two kinds of proections (24) Definition Let C R d be a nonempty closed convex set (a) Let q R d We define the B 1 -proection of q onto C as the unique element p C such that (25) B(p, q) = min {B(x, q) : x C } We denote p as p = Π q when the set C is clear from the context (b) Let p R d The B 2 -proection of p onto C is defined as the unique q C such that (26) B(p, q) = min {B(p, y) : y C } We denote q as q = ΠΠ p For this definition to work, it needs to be shown that Π and ΠΠ are in fact well defined operators This is indeed so, but we omit the details 8

9 It is useful to introduce the set of all elements in P that have finite distance to Q, and vice versa Let (27) B(P, q) = inf { B(p, q) : p P } The expression B(p, Q) is defined similarly We may now define the alternating proection method associated with the distance (squared) B Consider two nonempty closed convex sets P, Q R d For reasons that will transpire later we wish to find points p P, q Q such that (28) B(p, q ) = min { B(p, q) : p P, q Q } The alternating proection method for solving this problem would go as follows Let q 1 Q be arbitrary, but such that there exists an x P with B(x, q 1 ) < Let p 1 P be the B 1 -proection of q 1 onto P Then let q 2 Q be the B 2 -proection of p 1 onto Q, and repeat ad infinitum This gives rise to two sequences {p n } n P, {q n } n Q recursively defined by (29) p n = Π q n, q n+1 = ΠΠ p n, n = 1, 2, It has to be shown that this algorithm does not break down, but again we omit the details We proceed with proving the convergence of the alternating proection method, and begin by deriving the so-called three-points and four-points properties (210) Lemma (Three-points property) Let q 1 Q, with B(P, q 1 ) <, and let p 1 = Π q 1 Then for all p P Proof The left hand side equals b(p) b(p 1 ) b(q 1 ), p p 1 = Since p 1 B(p, q 1 ) B(p 1, q 1 ) B(p, p 1 ) b(p) b(p 1 ) b(p 1 ), p p 1 + b(p 1 ) b(q 1 ), p p 1 realizes min {B(p, q 1 ) : p P }, which is a convex minimization problem, the Kuhn-Tucker conditions tell us that 1 B(p 1, q 1 ), p p 1 0 for all p P, where 1 B denotes the gradient of B(p, q) with respect to p (the first variable) But 1 B(p, q) = b(p) b(q), so the result follows Qed The above Three-points property regarding the B 1 -proection seems reasonable enough; cf the case of the Euclidean norm squared The Four-points property regarding the B 2 -proection is much more mysterious 9

10 (211) Lemma (Four-points property) Let p 1 P, with B(p 1, Q) <, and let q 2 = ΠΠ p 1 Then for all x P, y Q Proof Using the identity we have that B(x, q 2 ) B(x, p 1 ) + B(x, y) B(x, p 1 ) = B(x, q 2 ) B(p 1, q 2 ) 1 B(p 1, q 2 ), x p 1 (212) B(x, p 1 ) + B(x, y) B(x, q 2 ) = Now, B(x, y) is convex in x, y ointly, so B(x, y) B(p 1, q 2 ) 1 B(p 1, q 2 ), x p 1 B(x, y) B(p 1, q 2 ) + 1 B(p 1, q 2 ), x p B(p 1, q 2 ), y q 2, with 2 B denoting the derivative (gradient) of B with respect to the second variable Thus the expression on the right of (212) dominates 2 B(p 1, q 2 ), y q 2, which is nonnegative for all y Q, by the Kuhn-Tucker conditions for the optimality of q 2 Qed The full content of these lemmas is not so obvious The following two monotonicity properties are quite remarkable consequences With an eye towards the application to maximum likelihood estimation we define the functional Λ as (213) Λ(q) = B(Π q, q), for all q Q with B(P, q) < (214) First Monotonicity Property Let q 1 Q with Λ(q 1 ) < Then Λ(q 2 ) <, and Proof Observe that Λ(q 1 ) Λ(q 2 ) B(p 1, p 2 ) 0 Λ(q 1 ) Λ(q 2 ) = { B(p 1, q 1 ) B(p 1, q 2 ) } + { B(p 1, q 2 ) B(p 2, q 2 ) } The expression between the first pair of curly brackets is nonnegative since q 2 = ΠΠ p 1 The Three-points lemma provides the lower bound B(p 1, p 2 ) for the second expression Qed To formulate the second monotonicity property, let P P be the set of all p o P such that (215) B(p o, Q) = B(P, Q) = inf { B(x, y) : x P, y Q } So P is the set of solutions p of the minimum distance problem (28) 10

11 (216) Second Monotonicity Property Let p P, and set q = ΠΠ p Select p 1 P such that B(p, p 1 ) < Then B(p, p 2 ) < as well, and B(p, p 1 ) B(p, p 2 ) Λ(q 2 ) Λ(q ) Proof The Four-points lemma, with x = p, y = q says that B(p, p 1 ) B(p, q 2 ) B(p, q ), and the Three-points lemma, with the indices incremented by 1, gives B(p, p 2 ) B(p 2, q 2 ) B(p, q 2 ) Adding these two inequalities gives B(p, p 1 ) B(p, p 2 ) B(p 2, q 2 ) B(p, q ), which is the required inequality Qed The proof that the alternating proection method converges is now quite simple, modulo a rather annoying assumption In the fully general setting there appears to be no way around it In specific instances it is always easily verified (217) Theorem Let p 1 P such that B(p, p 1 ) < for all p P Then {p n } n converges to some p o P, and {q n } n converges to some q o Q, and B(p o, q o ) = min { B(p, q) : p P, q Q } Proof By the First Monotonicity Property, {Λ(q n )} n is decreasing Let p P, and let q = ΠΠ p By the Second Monotonicity Property {B(p, p n )} n is decreasing, and since it is a nonnegative sequence, it has a limit Again the Second Monotonicity Property then implies that Λ(q n ) Λ(q ) Also, from the boundedness of {B(p, p n )} n condition (B3) implies that {p n } n is bounded, so it has a convergent subsequence, denoted by {p n } n M where M N Let p o be the limit of this subsequence Now {q n } n M is bounded, so it too has convergent subsequences Without loss of generality, we may assume that {q n+1 } n M is convergent, say with limit q o By the lower semi continuity (B2) of B, then B(p o, q o ) lim inf n M B(p n, q n ) = lim inf n M Λ(q n) = Λ(q ), 11

12 where the lim inf n M denotes the liminf as n, n M It follows that p o P (and that q o = ΠΠ p o, but never mind) To prove the convergence of the whole sequences, apply the above with p replaced by p o (Here the strange condition that B(p, p 1 ) < for all p P comes into play) Then {B(p o, p n )} n is decreasing, and by (B5) a subsequence converges to 0 It follows that the whole sequence converges to 0, so p n p o, n (n N) Now, since {q n } n is bounded, every subsequence has itself a convergent subsequence Call the limit q (o) By the lower semi continuity of B(p, q), we get ust as above that B(p o, q (o) ) Λ(q ) It follows that q (o) = ΠΠ p o, and then that the whole sequence {q n } n converges to q (o) The last statement follows from p P, and p = Πq, so that Λ(q ) is equal to the distance between P and Q Qed (218) Remark It is interesting to note that the alternating proection method and the associated Three- and Four-points property, as well as the two monotonicity properties work also for the problem minimize B(p, q) def = B(p, q) + F (q) subect to p P, q Q Here F is a differentiable convex function on Q Denoting the B 1 -proection of q onto P by p = Π q, and the B 2 -proection of p onto Q by q = ΠΠ p, the Three- and Four-points properties read, resp B(p, q) B(Π q, q) B(p, Π q), B(x, ΠΠ p) B(x, y) B(x, p) Note the distinction between B and B This is especially interesting in the case where P = Q, since then one is minimizing F (p) over p For B(p, q) = KL(p, q) this leads the implicit algorithm discussed in Eggermont [12], viz (219) x k+1 = x k 1 + [ F (x k+1 ) ], = 1, 2,, m (220) Remark We note that the standard application of the theory is to minimizing KL(b, Ax), with nonnegative A R n m with column sums equal to 1, and nonnegative 12

13 b R n It is interesting to note that it also applies to minimizing KL(Ax, b) The resulting algorithm is (221) x k+1 = x k exp ( [ A T {log(b/ax k )} ] ), = 1, 2,, m, and the algorithm converges as per the general theory It is interesting to note that if Ax = b has a nonnegative solution then the algorithm (221), with x 1 positive vector, converges to the solution of = u, a strictly (222) minimize m x log x u + u x subect to x 0, Ax = b See Elfving [16] What happens when Ax = b does not have an exact nonnegative solution is not so easy, apparently 3 Least Hellinger distance estimation We now apply the alternating proections method to the minimum Hellinger distance estimation problem (11) Note that the Hellinger distance H(p, q) satisfies the properties (B1) through (B5), but is not of the form (21) (eg, the gradient is not of the required form) So the general theory of 2 does not tell us whether this alternating proection method converges or not The alternating proections set-up is similar to the one employed for minimizing KL(b, Ax) by Csiszár and Tusnády [7], and for minimum Pearson s ϕ 2 distance employed by Mair, Rao and Anderson [19] Thus, let P and Q be defined as (31) P = { (p i ) R n m : p 0, Q = { (a i x ) R n m : x 0 }, p i = b i, i = 1, 2,, n } and consider the problem (32) minimize H(p, q) = p i q i 2 i subect to p P, q Q It is of course not clear why solutions to (32) should provide solutions to (31), but it will transpire that they do To determine the proection steps of the alternating proection method, let q 1 i = (a i x 1 ) Q be given The H 1-proection of q 1 onto P is obtained by minimizing 13

14 H(p, q 1 ) over p P Ignoring the nonnegativity constraints on p, the Lagrange Multiplier Theorem yields that p should solve q 1 i 1 + λ i = 0, pi for suitable λ i, and hence p i = a ix 1 (1 + λ i ) 2 for all i,, This shows that we are ustified in ignoring the nonnegativity constraint on p Summing over results in b i = [Ax 1 ] i /(1 + λ i ) 2, and so, for all i, (33) p 1 i = a i b i x 1 [Ax 1 ] i The H 2 -proection of p 1 onto Q is determined by minimizing H(p 1, q) = i ( i a i )x 2 x ( i a i p 1 i ) +, where denote terms independent of x Ignoring the nonnegativity constraint on x, and setting the gradient to 0 yields x = i ai p 1 i, or and q 2 x 2 = x 1 a i b i /[Ax 1 ] i i = (a i x 2 ) So we were ustified in ignoring the nonnegativity constraints, and the algorithm is (34) x 2 = x 1 [ A T {b/ax 1 } 1/2 ] 2, = 1, 2,, m, as advertised in the introduction The geometric intuition tells us that this algorithm converges In the next section we give an alternative derivation, and prove that is converges It is interesting to note that in all three minimum distance problems (Kullback-Leibler, Pearson s ϕ 2 and Hellinger) the first proection step (33) is the same This begs for an explanation Indeed, all three functions KL(x, y), P(x, y) and H(x, y) may be written in the form (35) Ψ(x, y) = n 14 y ψ ( x /y ),

15 where ψ is an increasing, differentiable, convex function defined for nonnegative numbers The functions Ψ are referred to as entropy functions, see, eg, Chen and Teboulle [6], and references therein It can now be shown that for given q, q i = a i x, the solution p to the problem (36) minimize Ψ(p, q) p P, with P as in (31), is given by (37) p i = a i b i x [ Ax ] i It should be noted that Ψ satisfies the conditions (B1) through (B5), but again is not of the required form (21) It is not clear that a Csiszar-Tusnady theory could be worked for this family of functions (35) 4 Maorizing functions for Hellinger distance We now apply the maorizing function approach of De Pierro [11] to the minimum Hellinger distance estimation problem (11) Note that H(b, Ax) is convex in x We begin by deriving a maorizing function, or, as we like to call it, a Tendentious Inequality, because it will suggest the minimization algorithm We have (41) H(b, Ax) = n [Ax] i 2 b i [Ax] i + b i, so only the second term needs consideration convexity of the function t t that Writing Ax = A{y(x/y)}, we get by [ { } [A y (x/y) ]i [Ax] i = [Ay] i [Ay] i [A { y [ x/y ] } 1/2 ] i [Ay] i [Ay] i = [A{ xy } ]i [Ay]i ] 1 2 It follows that H(b, Ax) n [Ax] i 2 [A xy ] i bi [Ay]i + b i, 15

16 or, (42) H(b, Ax) H(x, y) def = m x 2 x y [ A T b/ay ] + n b i This is the Tendentious Inequality The minimization algorithm it suggests for solving (41) is as follows If y = x k is a guess for a solution of (41), obtain a new and improved(?) guess x k+1 by minimizing H(x, x k ) as function of x The result is that (43) x k+1 = x k ([ A T b/ax k ] ) 2, = 1, 2,, m We now investigate the monotonicity properties In our search for the First Monotonicity Property we observe the following For ease of notation we let y = x k and x = x k+1 Then H(b, Ax) H(x, y) = m H(b, Ay) m H(b, Ay) m The formulation in terms of x k and x k+1 reads y { [ A T b/ay ] } 2 + n y { 1 + [ A T b/ay ] } 2 = x y 2 (44) H(b, Ax k ) H(b, Ax k+1 ) H(x k, x k+1 ), which is the First Monotonicity Property Note the lack of any hint of Kullback-Leibler But Kullback-Leibler pops up in the Second Monotonicity Property It turns out that the Second Monotonicity Property takes ust about the standard form Let x be a solution of (41) By (45) then x is a fixed point of the iteration (44), so [A T b/ax ] = 1 whenever x > 0 Now, with KL the standard Kullback-Leibler divergence, b i = (45) KL(x, x k ) KL(x, x k+1 ) = m = m x log xk+1 x k + x k x k+1 x k x k x log[ A T b/ax k ] In the usual fashion we have [ A T b/ax k ] = [ A T { 16 b Ax Ax Ax k } ],

17 and so, by the concavity of the logarithm (46) KL(x, x k ) KL(x, x k+1 ) m x k x k x [ A { b Ax T Ax log } ] Ax k m m x k x k+1 + x k x k+1 + n n 2 b i [Ax ] i log [Ax ] i [Ax k ] i 2 b i [Ax ] i 2 b i [Ax k ] i, where in the last line we used the inequality log t 1 t 1 Consequently (47) KL(x, x k ) KL(x, x k+1 ) n { [Ax k } ] i 2 b i [Ax k ] i + b i n { [Ax } ] i 2 b i [Ax ] i + b i + rest H(b, Ax k ) H(b, Ax ) + rest H(b, Ax k+1 ) + H(x k+1, x k ) H(b, Ax ) + rest, where in the last line we used (44), and the rest is given by (48) rest = n [Ax ] i m Now, x k+1 = m x x k+1 (49) where H(x k+1, x k ) + rest = m = m = n x k+1 + x k 2 x k+1 x k + x x k+1 ] x k 2x k [A b/ax T k + x = { [Ax k } ] i 2 b i [Ax k ] i + b i + rem = H(b, Ax k ) + rem, rem = m 17 x n b i =

18 Rather surprising, rem = H(b, Ax ), as we now show Since x is a fixed point of (43), and so (410) It follows that m x = m rem = m = n x [ A T b/ax ] = n x + m 2x n b i b i [Ax ] i, [Ax ] i + 2 b i [Ax ] i b i = H(b, Ax ) (411) KL(x, x k ) KL(x, x k+1 ) H(b, Ax k+1 ) + H(b, Ax k ) 2 H(b, Ax ) 0, which implies (412) KL(x, x k ) KL(x, x k+1 ) 2 { H(b, Ax k+1 ) H(b, Ax ) } 0 Either (411) or (412) may be considered as the Second Monotonicity Property The maorizing function approach applies to the smoothed minimum Hellinger distance problem (119) At the end of this section we show one may view this as a regularized version of (11) Note that H(b, A, x) is convex The Tendentious Inequality is (413) H(b, A, x) n b i + m x 2 x [ S{ My A T b / AMy } ], which gives rise to the algorithm (120) The first monotonicity property of Theorem (121) is similar to the unsmoothed case For the second monotonicity property we work backwards, in several steps analogous to the unsmoothed case The first ingredient is the observation that for any solution x of (119) m (414) x n b i = H(b, A, x ) The proof is ust about the same as before : since x is a fixed point of (120) With r = b / A T Mx, m x = m x [ S( M x (A T r ) ) ] = m [ Mx ] [ A T r ] = n [ AMx ] i ri = n bi [ AMx ] i, 18

19 where we used duality twice (or interchanging the order of summation) Now (414) follows as in (410) The second step is to show that (415) H(b, A, x k ) H(b, A, x ) = H(x k, x k+1, x ) + m x x k+1 This too follows similarly to the unsmoothed case : using (414) we have H(b, A, x k ) H(b, A, x ) = H(b, A, x k ) + m and now, with r k = b / AMx k, n Going back we get and (415) follows b i [ AMx k ] = m x k + x n = n [ AMx k ] r k = m [ Mx k ] [ A T r k ] = m = m = m x n b i 2 b i [ AMx k ] i, [ S x k ] [ Mx k ] [ A T r k ] [ x k ] [ S ( Mxk (A T r k ) ) ] x k xk+1 H(b, A, x k ) H(b, A, x ) = m Now backtracking as in (47) (46) (45) we get that H(b, A, x k+1 ) + H(x k+1, x k ) H(b, A, x ) + m m m x k x k+1 x k x k+1 + n + n m x k 2 x k xk+1 + x, x x k+1 2 ( ) bi [ AMx ] i b i [ AMx k ] i 2 b i [ AMx ] i log x k x k+1 19 [ AMx ] i [ AMxk ] i + 2 [ Mx ] [ A T r ] log [ AT r k ] [ A T r ]

20 So we now have (416) H(b, A, x k+1 ) + H(x k+1, x k ) H(b, A, x ) + m with (417) SUM = n x x k+1 m 2 [ Mx ] [ A T r ] log [ AT r k ] [ A T r ] x k x k+1 + SUM, The last step is to get from here to KL(x, x k ) KL(x, x k+1 ) We rewrite SUM as SUM = SUM I + SUM II, with SUM I = n SUM II = n With arguments used before, SUM I = m 2 [ Mx ] [ A T r ] log [ Mx k ] [ A T r k ] [ Mx ] [ A T r ], 2 x 2 [ Mx ] [ A T r ] log [ Mx ] [ Mx k ], [ S{ Mx (A T r ) log Mxk A T r k Mx A T r } ], and, now, in view of the iteration (120), of which x is a fixed point S inv x k+1 = Mx k A T r k, S inv x = Mx A T r, assuming that S is invertible (The following goes through without this assumption, actually) So we may write SUM I as (418) SUM I = m 2 x [ { S (S inv x x ) log Sinv k+1 S inv x } ] It should be noted that S inv x k+1 and S inv x are nonnegative vectors Now, for any nonnegative function U, by the concavity of the logarithm S ( (S inv x ) log U ) S ( S inv x ) log S ( (S inv x ) U ) S ( S inv x ), 20

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number

More information

24. The Branch and Bound Method

24. The Branch and Bound Method 24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no

More information

PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include 2 + 5.

PUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include 2 + 5. PUTNAM TRAINING POLYNOMIALS (Last updated: November 17, 2015) Remark. This is a list of exercises on polynomials. Miguel A. Lerma Exercises 1. Find a polynomial with integral coefficients whose zeros include

More information

FACTORING POLYNOMIALS IN THE RING OF FORMAL POWER SERIES OVER Z

FACTORING POLYNOMIALS IN THE RING OF FORMAL POWER SERIES OVER Z FACTORING POLYNOMIALS IN THE RING OF FORMAL POWER SERIES OVER Z DANIEL BIRMAJER, JUAN B GIL, AND MICHAEL WEINER Abstract We consider polynomials with integer coefficients and discuss their factorization

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

Metric Spaces. Chapter 1

Metric Spaces. Chapter 1 Chapter 1 Metric Spaces Many of the arguments you have seen in several variable calculus are almost identical to the corresponding arguments in one variable calculus, especially arguments concerning convergence

More information

Cost Minimization and the Cost Function

Cost Minimization and the Cost Function Cost Minimization and the Cost Function Juan Manuel Puerta October 5, 2009 So far we focused on profit maximization, we could look at a different problem, that is the cost minimization problem. This is

More information

constraint. Let us penalize ourselves for making the constraint too big. We end up with a

constraint. Let us penalize ourselves for making the constraint too big. We end up with a Chapter 4 Constrained Optimization 4.1 Equality Constraints (Lagrangians) Suppose we have a problem: Maximize 5, (x 1, 2) 2, 2(x 2, 1) 2 subject to x 1 +4x 2 =3 If we ignore the constraint, we get the

More information

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS

THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS KEITH CONRAD 1. Introduction The Fundamental Theorem of Algebra says every nonconstant polynomial with complex coefficients can be factored into linear

More information

Math 4310 Handout - Quotient Vector Spaces

Math 4310 Handout - Quotient Vector Spaces Math 4310 Handout - Quotient Vector Spaces Dan Collins The textbook defines a subspace of a vector space in Chapter 4, but it avoids ever discussing the notion of a quotient space. This is understandable

More information

CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e.

CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e. CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e. This chapter contains the beginnings of the most important, and probably the most subtle, notion in mathematical analysis, i.e.,

More information

1 if 1 x 0 1 if 0 x 1

1 if 1 x 0 1 if 0 x 1 Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or

More information

Factorization Theorems

Factorization Theorems Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Date: April 12, 2001. Contents

Date: April 12, 2001. Contents 2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

MOP 2007 Black Group Integer Polynomials Yufei Zhao. Integer Polynomials. June 29, 2007 Yufei Zhao yufeiz@mit.edu

MOP 2007 Black Group Integer Polynomials Yufei Zhao. Integer Polynomials. June 29, 2007 Yufei Zhao yufeiz@mit.edu Integer Polynomials June 9, 007 Yufei Zhao yufeiz@mit.edu We will use Z[x] to denote the ring of polynomials with integer coefficients. We begin by summarizing some of the common approaches used in dealing

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Algebra Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the 2012-13 school year.

Algebra Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the 2012-13 school year. This document is designed to help North Carolina educators teach the Common Core (Standard Course of Study). NCDPI staff are continually updating and improving these tools to better serve teachers. Algebra

More information

1 Portfolio mean and variance

1 Portfolio mean and variance Copyright c 2005 by Karl Sigman Portfolio mean and variance Here we study the performance of a one-period investment X 0 > 0 (dollars) shared among several different assets. Our criterion for measuring

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0. Cross product 1 Chapter 7 Cross product We are getting ready to study integration in several variables. Until now we have been doing only differential calculus. One outcome of this study will be our ability

More information

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d). 1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

More information

Stochastic Inventory Control

Stochastic Inventory Control Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the

More information

Continuity of the Perron Root

Continuity of the Perron Root Linear and Multilinear Algebra http://dx.doi.org/10.1080/03081087.2014.934233 ArXiv: 1407.7564 (http://arxiv.org/abs/1407.7564) Continuity of the Perron Root Carl D. Meyer Department of Mathematics, North

More information

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Linear Algebra Notes for Marsden and Tromba Vector Calculus Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of

More information

Notes on Factoring. MA 206 Kurt Bryan

Notes on Factoring. MA 206 Kurt Bryan The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor

More information

Chapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm.

Chapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm. Chapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm. We begin by defining the ring of polynomials with coefficients in a ring R. After some preliminary results, we specialize

More information

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson

Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson Mathematics for Computer Science/Software Engineering Notes for the course MSM1F3 Dr. R. A. Wilson October 1996 Chapter 1 Logic Lecture no. 1. We introduce the concept of a proposition, which is a statement

More information

CONTINUED FRACTIONS AND FACTORING. Niels Lauritzen

CONTINUED FRACTIONS AND FACTORING. Niels Lauritzen CONTINUED FRACTIONS AND FACTORING Niels Lauritzen ii NIELS LAURITZEN DEPARTMENT OF MATHEMATICAL SCIENCES UNIVERSITY OF AARHUS, DENMARK EMAIL: niels@imf.au.dk URL: http://home.imf.au.dk/niels/ Contents

More information

Quotient Rings and Field Extensions

Quotient Rings and Field Extensions Chapter 5 Quotient Rings and Field Extensions In this chapter we describe a method for producing field extension of a given field. If F is a field, then a field extension is a field K that contains F.

More information

10. Proximal point method

10. Proximal point method L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing

More information

Linear Programming Notes V Problem Transformations

Linear Programming Notes V Problem Transformations Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material

More information

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all. 1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.

More information

The equivalence of logistic regression and maximum entropy models

The equivalence of logistic regression and maximum entropy models The equivalence of logistic regression and maximum entropy models John Mount September 23, 20 Abstract As our colleague so aptly demonstrated ( http://www.win-vector.com/blog/20/09/the-simplerderivation-of-logistic-regression/

More information

THREE DIMENSIONAL GEOMETRY

THREE DIMENSIONAL GEOMETRY Chapter 8 THREE DIMENSIONAL GEOMETRY 8.1 Introduction In this chapter we present a vector algebra approach to three dimensional geometry. The aim is to present standard properties of lines and planes,

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

1 Sets and Set Notation.

1 Sets and Set Notation. LINEAR ALGEBRA MATH 27.6 SPRING 23 (COHEN) LECTURE NOTES Sets and Set Notation. Definition (Naive Definition of a Set). A set is any collection of objects, called the elements of that set. We will most

More information

1 Short Introduction to Time Series

1 Short Introduction to Time Series ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The

More information

Bipan Hazarika ON ACCELERATION CONVERGENCE OF MULTIPLE SEQUENCES. 1. Introduction

Bipan Hazarika ON ACCELERATION CONVERGENCE OF MULTIPLE SEQUENCES. 1. Introduction F A S C I C U L I M A T H E M A T I C I Nr 51 2013 Bipan Hazarika ON ACCELERATION CONVERGENCE OF MULTIPLE SEQUENCES Abstract. In this article the notion of acceleration convergence of double sequences

More information

Nonlinear Optimization: Algorithms 3: Interior-point methods

Nonlinear Optimization: Algorithms 3: Interior-point methods Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,

More information

Mathematical Methods of Engineering Analysis

Mathematical Methods of Engineering Analysis Mathematical Methods of Engineering Analysis Erhan Çinlar Robert J. Vanderbei February 2, 2000 Contents Sets and Functions 1 1 Sets................................... 1 Subsets.............................

More information

Linear Algebra I. Ronald van Luijk, 2012

Linear Algebra I. Ronald van Luijk, 2012 Linear Algebra I Ronald van Luijk, 2012 With many parts from Linear Algebra I by Michael Stoll, 2007 Contents 1. Vector spaces 3 1.1. Examples 3 1.2. Fields 4 1.3. The field of complex numbers. 6 1.4.

More information

3. INNER PRODUCT SPACES

3. INNER PRODUCT SPACES . INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.

More information

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing

More information

EMBEDDING COUNTABLE PARTIAL ORDERINGS IN THE DEGREES

EMBEDDING COUNTABLE PARTIAL ORDERINGS IN THE DEGREES EMBEDDING COUNTABLE PARTIAL ORDERINGS IN THE ENUMERATION DEGREES AND THE ω-enumeration DEGREES MARIYA I. SOSKOVA AND IVAN N. SOSKOV 1. Introduction One of the most basic measures of the complexity of a

More information

Understanding Basic Calculus

Understanding Basic Calculus Understanding Basic Calculus S.K. Chung Dedicated to all the people who have helped me in my life. i Preface This book is a revised and expanded version of the lecture notes for Basic Calculus and other

More information

The van Hoeij Algorithm for Factoring Polynomials

The van Hoeij Algorithm for Factoring Polynomials The van Hoeij Algorithm for Factoring Polynomials Jürgen Klüners Abstract In this survey we report about a new algorithm for factoring polynomials due to Mark van Hoeij. The main idea is that the combinatorial

More information

No: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics

No: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics No: 10 04 Bilkent University Monotonic Extension Farhad Husseinov Discussion Papers Department of Economics The Discussion Papers of the Department of Economics are intended to make the initial results

More information

On Adaboost and Optimal Betting Strategies

On Adaboost and Optimal Betting Strategies On Adaboost and Optimal Betting Strategies Pasquale Malacaria School of Electronic Engineering and Computer Science Queen Mary, University of London Email: pm@dcs.qmul.ac.uk Fabrizio Smeraldi School of

More information

The Ideal Class Group

The Ideal Class Group Chapter 5 The Ideal Class Group We will use Minkowski theory, which belongs to the general area of geometry of numbers, to gain insight into the ideal class group of a number field. We have already mentioned

More information

Ideal Class Group and Units

Ideal Class Group and Units Chapter 4 Ideal Class Group and Units We are now interested in understanding two aspects of ring of integers of number fields: how principal they are (that is, what is the proportion of principal ideals

More information

Section 1.1. Introduction to R n

Section 1.1. Introduction to R n The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to

More information

A FIRST COURSE IN OPTIMIZATION THEORY

A FIRST COURSE IN OPTIMIZATION THEORY A FIRST COURSE IN OPTIMIZATION THEORY RANGARAJAN K. SUNDARAM New York University CAMBRIDGE UNIVERSITY PRESS Contents Preface Acknowledgements page xiii xvii 1 Mathematical Preliminaries 1 1.1 Notation

More information

On the representability of the bi-uniform matroid

On the representability of the bi-uniform matroid On the representability of the bi-uniform matroid Simeon Ball, Carles Padró, Zsuzsa Weiner and Chaoping Xing August 3, 2012 Abstract Every bi-uniform matroid is representable over all sufficiently large

More information

Prime Numbers and Irreducible Polynomials

Prime Numbers and Irreducible Polynomials Prime Numbers and Irreducible Polynomials M. Ram Murty The similarity between prime numbers and irreducible polynomials has been a dominant theme in the development of number theory and algebraic geometry.

More information

Introduction to Algebraic Geometry. Bézout s Theorem and Inflection Points

Introduction to Algebraic Geometry. Bézout s Theorem and Inflection Points Introduction to Algebraic Geometry Bézout s Theorem and Inflection Points 1. The resultant. Let K be a field. Then the polynomial ring K[x] is a unique factorisation domain (UFD). Another example of a

More information

Metric Spaces. Chapter 7. 7.1. Metrics

Metric Spaces. Chapter 7. 7.1. Metrics Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some

More information

Chapter 21: The Discounted Utility Model

Chapter 21: The Discounted Utility Model Chapter 21: The Discounted Utility Model 21.1: Introduction This is an important chapter in that it introduces, and explores the implications of, an empirically relevant utility function representing intertemporal

More information

GROUPS ACTING ON A SET

GROUPS ACTING ON A SET GROUPS ACTING ON A SET MATH 435 SPRING 2012 NOTES FROM FEBRUARY 27TH, 2012 1. Left group actions Definition 1.1. Suppose that G is a group and S is a set. A left (group) action of G on S is a rule for

More information

What is Linear Programming?

What is Linear Programming? Chapter 1 What is Linear Programming? An optimization problem usually has three essential ingredients: a variable vector x consisting of a set of unknowns to be determined, an objective function of x to

More information

Lecture 7: Finding Lyapunov Functions 1

Lecture 7: Finding Lyapunov Functions 1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.243j (Fall 2003): DYNAMICS OF NONLINEAR SYSTEMS by A. Megretski Lecture 7: Finding Lyapunov Functions 1

More information

Duality of linear conic problems

Duality of linear conic problems Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

9. POLYNOMIALS. Example 1: The expression a(x) = x 3 4x 2 + 7x 11 is a polynomial in x. The coefficients of a(x) are the numbers 1, 4, 7, 11.

9. POLYNOMIALS. Example 1: The expression a(x) = x 3 4x 2 + 7x 11 is a polynomial in x. The coefficients of a(x) are the numbers 1, 4, 7, 11. 9. POLYNOMIALS 9.1. Definition of a Polynomial A polynomial is an expression of the form: a(x) = a n x n + a n-1 x n-1 +... + a 1 x + a 0. The symbol x is called an indeterminate and simply plays the role

More information

WHAT ARE MATHEMATICAL PROOFS AND WHY THEY ARE IMPORTANT?

WHAT ARE MATHEMATICAL PROOFS AND WHY THEY ARE IMPORTANT? WHAT ARE MATHEMATICAL PROOFS AND WHY THEY ARE IMPORTANT? introduction Many students seem to have trouble with the notion of a mathematical proof. People that come to a course like Math 216, who certainly

More information

Applications to Data Smoothing and Image Processing I

Applications to Data Smoothing and Image Processing I Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is

More information

Revised Version of Chapter 23. We learned long ago how to solve linear congruences. ax c (mod m)

Revised Version of Chapter 23. We learned long ago how to solve linear congruences. ax c (mod m) Chapter 23 Squares Modulo p Revised Version of Chapter 23 We learned long ago how to solve linear congruences ax c (mod m) (see Chapter 8). It s now time to take the plunge and move on to quadratic equations.

More information

PYTHAGOREAN TRIPLES KEITH CONRAD

PYTHAGOREAN TRIPLES KEITH CONRAD PYTHAGOREAN TRIPLES KEITH CONRAD 1. Introduction A Pythagorean triple is a triple of positive integers (a, b, c) where a + b = c. Examples include (3, 4, 5), (5, 1, 13), and (8, 15, 17). Below is an ancient

More information

Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass.

Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass. Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Linear Codes. Chapter 3. 3.1 Basics

Linear Codes. Chapter 3. 3.1 Basics Chapter 3 Linear Codes In order to define codes that we can encode and decode efficiently, we add more structure to the codespace. We shall be mainly interested in linear codes. A linear code of length

More information

discuss how to describe points, lines and planes in 3 space.

discuss how to describe points, lines and planes in 3 space. Chapter 2 3 Space: lines and planes In this chapter we discuss how to describe points, lines and planes in 3 space. introduce the language of vectors. discuss various matters concerning the relative position

More information

A Branch and Bound Algorithm for Solving the Binary Bi-level Linear Programming Problem

A Branch and Bound Algorithm for Solving the Binary Bi-level Linear Programming Problem A Branch and Bound Algorithm for Solving the Binary Bi-level Linear Programming Problem John Karlof and Peter Hocking Mathematics and Statistics Department University of North Carolina Wilmington Wilmington,

More information

Factoring Patterns in the Gaussian Plane

Factoring Patterns in the Gaussian Plane Factoring Patterns in the Gaussian Plane Steve Phelps Introduction This paper describes discoveries made at the Park City Mathematics Institute, 00, as well as some proofs. Before the summer I understood

More information

CHAPTER SIX IRREDUCIBILITY AND FACTORIZATION 1. BASIC DIVISIBILITY THEORY

CHAPTER SIX IRREDUCIBILITY AND FACTORIZATION 1. BASIC DIVISIBILITY THEORY January 10, 2010 CHAPTER SIX IRREDUCIBILITY AND FACTORIZATION 1. BASIC DIVISIBILITY THEORY The set of polynomials over a field F is a ring, whose structure shares with the ring of integers many characteristics.

More information

3. Linear Programming and Polyhedral Combinatorics

3. Linear Programming and Polyhedral Combinatorics Massachusetts Institute of Technology Handout 6 18.433: Combinatorial Optimization February 20th, 2009 Michel X. Goemans 3. Linear Programming and Polyhedral Combinatorics Summary of what was seen in the

More information

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009. Notes on Algebra

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009. Notes on Algebra U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009 Notes on Algebra These notes contain as little theory as possible, and most results are stated without proof. Any introductory

More information

LINEAR ALGEBRA W W L CHEN

LINEAR ALGEBRA W W L CHEN LINEAR ALGEBRA W W L CHEN c W W L Chen, 1997, 2008 This chapter is available free to all individuals, on understanding that it is not to be used for financial gain, and may be downloaded and/or photocopied,

More information

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION 1 A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION Dimitri Bertsekas M.I.T. FEBRUARY 2003 2 OUTLINE Convexity issues in optimization Historical remarks Our treatment of the subject Three unifying lines of

More information

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1. MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

More information

You know from calculus that functions play a fundamental role in mathematics.

You know from calculus that functions play a fundamental role in mathematics. CHPTER 12 Functions You know from calculus that functions play a fundamental role in mathematics. You likely view a function as a kind of formula that describes a relationship between two (or more) quantities.

More information

A QUICK GUIDE TO THE FORMULAS OF MULTIVARIABLE CALCULUS

A QUICK GUIDE TO THE FORMULAS OF MULTIVARIABLE CALCULUS A QUIK GUIDE TO THE FOMULAS OF MULTIVAIABLE ALULUS ontents 1. Analytic Geometry 2 1.1. Definition of a Vector 2 1.2. Scalar Product 2 1.3. Properties of the Scalar Product 2 1.4. Length and Unit Vectors

More information

Properties of BMO functions whose reciprocals are also BMO

Properties of BMO functions whose reciprocals are also BMO Properties of BMO functions whose reciprocals are also BMO R. L. Johnson and C. J. Neugebauer The main result says that a non-negative BMO-function w, whose reciprocal is also in BMO, belongs to p> A p,and

More information

Nonlinear Iterative Partial Least Squares Method

Nonlinear Iterative Partial Least Squares Method Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

More information

A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property

A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property Venkat Chandar March 1, 2008 Abstract In this note, we prove that matrices whose entries are all 0 or 1 cannot achieve

More information

Inference of Probability Distributions for Trust and Security applications

Inference of Probability Distributions for Trust and Security applications Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations

More information

Linear algebra and the geometry of quadratic equations. Similarity transformations and orthogonal matrices

Linear algebra and the geometry of quadratic equations. Similarity transformations and orthogonal matrices MATH 30 Differential Equations Spring 006 Linear algebra and the geometry of quadratic equations Similarity transformations and orthogonal matrices First, some things to recall from linear algebra Two

More information

5 Numerical Differentiation

5 Numerical Differentiation D. Levy 5 Numerical Differentiation 5. Basic Concepts This chapter deals with numerical approximations of derivatives. The first questions that comes up to mind is: why do we need to approximate derivatives

More information

INTRODUCTORY SET THEORY

INTRODUCTORY SET THEORY M.Sc. program in mathematics INTRODUCTORY SET THEORY Katalin Károlyi Department of Applied Analysis, Eötvös Loránd University H-1088 Budapest, Múzeum krt. 6-8. CONTENTS 1. SETS Set, equal sets, subset,

More information

Applied Algorithm Design Lecture 5

Applied Algorithm Design Lecture 5 Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design

More information

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation

More information

it is easy to see that α = a

it is easy to see that α = a 21. Polynomial rings Let us now turn out attention to determining the prime elements of a polynomial ring, where the coefficient ring is a field. We already know that such a polynomial ring is a UF. Therefore

More information

6 EXTENDING ALGEBRA. 6.0 Introduction. 6.1 The cubic equation. Objectives

6 EXTENDING ALGEBRA. 6.0 Introduction. 6.1 The cubic equation. Objectives 6 EXTENDING ALGEBRA Chapter 6 Extending Algebra Objectives After studying this chapter you should understand techniques whereby equations of cubic degree and higher can be solved; be able to factorise

More information

Unique Factorization

Unique Factorization Unique Factorization Waffle Mathcamp 2010 Throughout these notes, all rings will be assumed to be commutative. 1 Factorization in domains: definitions and examples In this class, we will study the phenomenon

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Manifold Learning Examples PCA, LLE and ISOMAP

Manifold Learning Examples PCA, LLE and ISOMAP Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition

More information

GRE Prep: Precalculus

GRE Prep: Precalculus GRE Prep: Precalculus Franklin H.J. Kenter 1 Introduction These are the notes for the Precalculus section for the GRE Prep session held at UCSD in August 2011. These notes are in no way intended to teach

More information