ON EMLIKE ALGORITHMS FOR MINIMUM DISTANCE ESTIMATION. P.P.B. Eggermont and V.N. LaRiccia University of Delaware


 Timothy Evans
 1 years ago
 Views:
Transcription
1 March 1998 ON EMLIKE ALGORITHMS FOR MINIMUM DISTANCE ESTIMATION PPB Eggermont and VN LaRiccia University of Delaware Abstract We study minimum distance estimation problems related to maximum likelihood estimation in positron emission tomography (pet), which admit algorithms similar to the standard em algorithm for pet with the same type of monotonicity properties as does the em algorithm, see Vardi, Shepp, and Kaufman [25] We derive the algorithms via the maorizing function approach of De Pierro [11], as well as via the alternating proections approach of Csiszár and Tusnády [7], and prove the monotonicity properties of these algorithms The distances studied include the Hellinger distance and crossentropy The Pearson s ϕ 2 distance fits in, but does not seem to enoy both monotonicity properties For nonnegatively constrained least squares problems the two approaches lead to different algorithms, both of which enoy the strong monotonicity properties Corresponding author: Paul Eggermont Department of Mathematical Sciences University of Delaware Newark, Delaware telephone : (302) fax : (302)
2 1 Introduction In this paper we study various minimum distance estimation problems that are similar to maximum likelihood estimation for positron emission tomography and that admit minimization algorithms similar to the EM algorithm of Shepp and Vardi [23], with similar monotonicity properties The distances under discussion are mainly the Hellinger distance and Pearson s ϕ 2 distance The last one was recently studied by Mair, Rao and Anderson [19] We also discuss smoothed (roughness penalized) minimum distance estimation problems, and briefly discuss minimum crossentropy and minimum Burgentropy estimation problems The results are new for the minimum Hellinger distance estimation problem, as well as for the smoothed versions of Hellinger and Pearson s ϕ 2 problems In minimum Hellinger distance estimation one solves the problem (11) minimize H(b, Ax) def = n [Ax] i b i 2 subect to x 0 component wise, where A R n m is a nonnegative matrix, with coefficients a i and with columns sums equal to one : (12) n a i = 1, = 1, 2,, m, and b R n is a nonnegative data vector The Hellinger distance is closely related to both the KullbackLeibler distance (13) KL(u, w) = n and Pearson s ϕ 2 distance u i log u i w i + w i u i, (14) P(u, w) = n u i w i 2 w i The problem (15) minimize KL(b, Ax) subect to x 0 is the maximum likelihood estimation problem familiar from astronomical image processing, Richardson [21], Lucy [18], and emission tomography, Rockmore and Macovski [22], Shepp and Vardi [23] There, the underlying model is that b 1, b 2,, b n 2
3 are independent Poisson random variables with means [Ax o ] i, i = 1, 2,, n Here x o is an unknown probability vector one wishes to estimate So x o satisfies (16) m x o, = 1 Since the number of parameters to be estimated is typically quite large, this problem behaves like a nonparametric estimation problem The Pearson s ϕ 2 distance arises from the normal approximation to the Poisson distribution, see Mair, Rao and Anderson [19] In this context, minimum Hellinger distance estimation is suggested by its role played in parametric estimation problems For parametric problems, minimum Hellinger distance estimation enoys optimality properties similar to maximum likelihood estimation, if the postulated model is in fact true Moreover, its robustness with respect to modeling errors is well documented, see, eg, Beran [1], Tamura and Boos [24], and references therein Here we concentrate on methods for solving (11), with special emphasis on EMlike algorithms with EMlike monotonicity properties In the process we point out other, similar minimization problems with similar algorithms Byrne [3] does more or less the same, but considers a quite different set of algorithms See also Byrne [4] The EM algorithm for solving (15) is, starting from any strictly positive vector x 1 (17) x k+1 = x k [ A T r k ], = 1, 2,, m, with ri k = b i/[ Ax k ] i (We abbreviate this as r k = b / Ax k ) The model (15) and the algorithm (17) was introduced by Richardson [21] and Lucy [18] in astronomical image processing, and by Shepp and Vardi [23] in positron emission tomography Vardi, Shepp, and Kaufman [25] derived the two wonderful monotonicity properties of the EM algorithm The first monotonicity property is that (18) KL(b, Ax k ) KL(b, Ax k+1 ) KL(x k+1, x k ), k 1, which says that the algorithm (17) decreases the negative loglikelihood KL(b, Ax) This is about the least one would expect of an algorithm for minimizing KL second one is quite unexpected If x is any solution of (15) then (19) KL(x, x k ) KL(x, x k+1 ) KL(b, Ax k ) KL(b, Ax k+1 ) In combination with (18) this says that the x k get closer to every solution of (15) (The everyday image is that the x k land on the solution set like a helicopter on an airfield, 3 The
4 rather than like a plane) The convergence of the algorithm (17) to a solution of (15) is an easy consequence, see, eg, Vardi, Shepp, and Kaufman [25] or Byrne [2] Vardi, Shepp, and Kaufman [25] modeled their proof of the monotonicity properties (18) and (19) on the alternating proection approach of Csiszár and Tusnády [7] There are two aspects to this geometric view The first one comprises the setting in which the alternating proection method may be formulated and in which it solves the original minimization problem if the algorithm in fact converges The second aspect is the proof of the convergence of the algorithm, which requires extra conditions on the obective function The Csiszár and Tusnády [7] approach applies in full to (15) with the resulting algorithm (17) and monotonicity properties (18) and (19) The approach applies only partially to minimizing the Pearson s ϕ 2 distance and Hellinger distance Mair, Rao and Anderson [19] showed that the Csiszár and Tusnády [7] approach applies to minimizing P(b, Ax), with the resulting algorithm (110) x k+1 = x k {[ A T r k ] } 1/2, where ri k = (b i / [Ax k ] i ) 2 Unfortunately, this is where it ends There is a first monotonicity property, of course, but a second monotonicity property analogous to (19) is not provided by the alternating proections approach Likewise, the Csiszár and Tusnády [7] approach applies to the minimum Hellinger distance estimation problem (11) with the resulting algorithm (111) x k+1 = x k {[ A T r k ] } 2, where now ri k = (b i / [Ax k ] i ) 1/2 The (dis)similarity with (110) is uncanny Unfortunately, here too a second monotonicity property is not provided However, there is a second approach to deriving these algorithms De Pierro [9], [11] used this approach both to derive algorithms for penalized versions of (15), and to show monotonicity properties This was based on his interpretation of the analytic proofs of the monotonicity properties (18) and (19) by Mülthei and Schorr [20] De Pierro [11] calls it the maorizing function approach, because it is based on the inequality (112) KL(b, Ax) KL(b, Ay) + Λ KL (x, y), with (113) Λ KL (x, y) = m y [ A T {b/ay} ] log y x + x y, 4
5 for nonnegative x, y R m Note that Λ KL (y, y) = 0 The EM algorithm now arises by minimizing Λ KL (x, x k ) over x We show in this paper that this approach extends to the algorithms (110) and (111) We prove the following (114) Theorem Let x 1 R m be strictly positive, and let x be any solution of (11) Then the sequence { x k } k generated by (111) satisfies H(b, Ax k ) H(b, Ax k+1 ) H(x k, x k+1 ), KL(x, x k ) KL(x, x k+1 ) 2 { H(b, Ax k+1 ) H(b, Ax ) } Again, the convergence of the algorithms (111) is an easy consequence An unexplained feature of the second monotonicity property is that the KullbackLeibler distance pops up again For the algorithm (110) we are not so fortunate There is a first monotonicity property, Mair, Rao and Anderson [19], (115) P(b, Ax k ) P(b, Ax k+1 ) P(x k+1, x k ), but a second monotonicty property analogous to (19) remains elusive in this setup as well At this point we cannot resist mentioning our smoothed EM algorithm Let S R m m be a symmetric (nonnegative) smoothing matrix with all columns sums equal to 1, and define the nonlinear smoother N (based on geometric averages) by (116) [ N x ] = exp( [ S{log f} ] ), = 1, 2,, m The smoothed version of the maximum likelihood estimation problem (15) is (117) n b i minimize b i log + [ Ax ] i b i [ AN x ] i subect to x 0 component wise, The problem (117) also admits an EM algorithm, viz (118) x k+1 = S { (N x k ) (A T r k ) }, with r k i = b i / [ AN x k ] i for all i, and (N x k ) (A T r k ) is the component wise product of the two vectors N x k and A T r k Moreover, the analogues of the monotonicity properties hold, see Eggermont [13], Eggermont and LaRiccia [14] The rather surprising 5
6 thing is that there is an analogue of this for (11) before, define the nonlinear smoother M by With the smoothing matrix as (119) [ M x ] = { [ S( x ) ] } 2, and consider the problem (120) minimize H(b, A, x) def = n subect to x 0 component wise The algorithm for (120) analogous to (118) is b i 2 b i [ AM x ] i + [ Ax ] i (121) x k+1 = S { Mxk (A T r k ) }2, and its monotonicity properties are stated in the following theorem (122) Theorem Let x 1 R m be strictly positive, and let x be any solution of (119) Then the sequence { x k } k generated by (120) satisfies H(b, A, x k ) H(b, A, x k+1 ) H(x k, x k+1 ), KL(x, x k ) KL(x, x k+1 ) 2 { H(b, A, x k+1 ) H(b, A, x ) } There is a similar algorithm with analogous monotonicity properties for the minimization problem (116) with the nonlinear smoother N replaced by the nonlinear smoother M, see Eggermont and LaRiccia [15] Finally, there is an analogous smoothed version with the analogous monotonicity properties for minimum Pearson s ϕ 2 estimation, see 5 (but no second monotonicity property) The proofs of all these monotonicity properties for these smoothed algorithms are substantially the same, but a unifying theory, say along the lines of Csiszár and Tusnády [7], has not been forthcoming Earlier on we mentioned the close connection between the KullbackLeibler, Pearson s ϕ 2 and Hellinger distances This is further illustrated by considering the following two algorithms for solving (15) With Λ KL strictly positive vector x 1, let x k+1 be the solution to the maorizing function, and starting from a (122) minimize Λ KL (x, x k ) + P(x, x k ) subect to x 0 It turns out that the resulting algorithm is a multiplicatively relaxed version of the EM algorithm (17), viz (123) x k+1 = x k ( [ A T {b / Ax k } ] ) 1/2 6
7 Note the difference with algorithm (110)! The algorithm (123) has ust about the same monotonicity properties (18) and (19), see Iusem [17] The Hellinger analogue of (122) also works That is, if x k+1 is defined (recursively) as the solution to (124) minimize Λ KL (x, x k ) + H(x, x k ) subect to x 0, then (125) x k+1 = x k { } [ A T { b / Ax k } ], = 1, 2,, m, and this too is a multiplicatively and additively relaxed version of (17) and satisfies analogues of the two monotonicity properties We omit the details We emphasize again that these last two algorithms are merely stated to show the close interplay between the three distances under discussion In the next section we discuss the alternating proection method and point out some applications In 3 we discuss its application to minimum Hellinger distance estimation, and derive the algorithm In 4 and 5 we discuss the maorizing function approach to minimum Hellinger and minimum Pearson s ϕ 2 estimation problems, as well as to minimizing Burgentropy In 6 we briefly discuss the maorizing function approach to nonnegatively constrained least squares estimation : in this case this leads to an algorithm different from the Csiszár and Tusnády [7] approach 2 Alternating proections onto closed convex subsets of R d In this section we discuss the alternating proection method of Csiszár and Tusnády [7], and give a slightly more general proof of the convergence However, the exposition follows quite closely that of Csiszár and Tusnády [7] Since proections onto closed convex sets may be thought of as being obtained as solutions of minimum distance problems, we begin by introducing suitable generalizations of (the square of) Euclidean distance Let b : domain b R d R { } be a proper convex, lower semi continuous function For simplicity we assume also that b is differentiable on its domain If b is not differentiable, then the notion of subgradients may be used, but this would cause technical complications On domain B = domain b domain b define (21) B(x, y) = b(x) b(y) b(y), x y, 7
8 where b denotes the gradient of b Note that B(x, y) 0 for all x, y, by the convexity of b To strengthen the interpretation of B(x, y) as distance squared we make the following assumptions (B1) (B2) B(x, y) is convex in x, y ointly, and strictly convex in x and in y separately B(x, y) is lowersemicontinuous in x, y ointly (B3) B(x, y) has bounded level sets for fixed x, and for fixed y (B4) If B(x n, y n ) 0, and {x n } n or {y n } n is bounded, then x n y n 0 (B5) If x o P, and x o y n 0, then B(x o, y n ) 0 These conditions are somewhat technical, but they are precisely what is needed later on An important feature is that we do not require symmetry of B(x, y) in x and y (22) Remark It is easily checked that B satisfies the above conditions when b is one of the following three examples : (a) b(x) = m x log x, x 0 ; (b) b(x) = m x2, x Rm ; (c) b(x) = m xp, x 0, where 1 < p < 2 It is not so clear whether there are other (interesting) examples (23) Remark It is likewise easily checked that the functions B given below are not of the form (21), but do satisfy (B1) through (B5) (a) B(x, y) = m x y 2, x, y 0 (b) B(x, y) = m x y 2 /y, y > 0, x 0 Since there is no symmetry, the function B(x, y) gives rise to two kinds of proections (24) Definition Let C R d be a nonempty closed convex set (a) Let q R d We define the B 1 proection of q onto C as the unique element p C such that (25) B(p, q) = min {B(x, q) : x C } We denote p as p = Π q when the set C is clear from the context (b) Let p R d The B 2 proection of p onto C is defined as the unique q C such that (26) B(p, q) = min {B(p, y) : y C } We denote q as q = ΠΠ p For this definition to work, it needs to be shown that Π and ΠΠ are in fact well defined operators This is indeed so, but we omit the details 8
9 It is useful to introduce the set of all elements in P that have finite distance to Q, and vice versa Let (27) B(P, q) = inf { B(p, q) : p P } The expression B(p, Q) is defined similarly We may now define the alternating proection method associated with the distance (squared) B Consider two nonempty closed convex sets P, Q R d For reasons that will transpire later we wish to find points p P, q Q such that (28) B(p, q ) = min { B(p, q) : p P, q Q } The alternating proection method for solving this problem would go as follows Let q 1 Q be arbitrary, but such that there exists an x P with B(x, q 1 ) < Let p 1 P be the B 1 proection of q 1 onto P Then let q 2 Q be the B 2 proection of p 1 onto Q, and repeat ad infinitum This gives rise to two sequences {p n } n P, {q n } n Q recursively defined by (29) p n = Π q n, q n+1 = ΠΠ p n, n = 1, 2, It has to be shown that this algorithm does not break down, but again we omit the details We proceed with proving the convergence of the alternating proection method, and begin by deriving the socalled threepoints and fourpoints properties (210) Lemma (Threepoints property) Let q 1 Q, with B(P, q 1 ) <, and let p 1 = Π q 1 Then for all p P Proof The left hand side equals b(p) b(p 1 ) b(q 1 ), p p 1 = Since p 1 B(p, q 1 ) B(p 1, q 1 ) B(p, p 1 ) b(p) b(p 1 ) b(p 1 ), p p 1 + b(p 1 ) b(q 1 ), p p 1 realizes min {B(p, q 1 ) : p P }, which is a convex minimization problem, the KuhnTucker conditions tell us that 1 B(p 1, q 1 ), p p 1 0 for all p P, where 1 B denotes the gradient of B(p, q) with respect to p (the first variable) But 1 B(p, q) = b(p) b(q), so the result follows Qed The above Threepoints property regarding the B 1 proection seems reasonable enough; cf the case of the Euclidean norm squared The Fourpoints property regarding the B 2 proection is much more mysterious 9
10 (211) Lemma (Fourpoints property) Let p 1 P, with B(p 1, Q) <, and let q 2 = ΠΠ p 1 Then for all x P, y Q Proof Using the identity we have that B(x, q 2 ) B(x, p 1 ) + B(x, y) B(x, p 1 ) = B(x, q 2 ) B(p 1, q 2 ) 1 B(p 1, q 2 ), x p 1 (212) B(x, p 1 ) + B(x, y) B(x, q 2 ) = Now, B(x, y) is convex in x, y ointly, so B(x, y) B(p 1, q 2 ) 1 B(p 1, q 2 ), x p 1 B(x, y) B(p 1, q 2 ) + 1 B(p 1, q 2 ), x p B(p 1, q 2 ), y q 2, with 2 B denoting the derivative (gradient) of B with respect to the second variable Thus the expression on the right of (212) dominates 2 B(p 1, q 2 ), y q 2, which is nonnegative for all y Q, by the KuhnTucker conditions for the optimality of q 2 Qed The full content of these lemmas is not so obvious The following two monotonicity properties are quite remarkable consequences With an eye towards the application to maximum likelihood estimation we define the functional Λ as (213) Λ(q) = B(Π q, q), for all q Q with B(P, q) < (214) First Monotonicity Property Let q 1 Q with Λ(q 1 ) < Then Λ(q 2 ) <, and Proof Observe that Λ(q 1 ) Λ(q 2 ) B(p 1, p 2 ) 0 Λ(q 1 ) Λ(q 2 ) = { B(p 1, q 1 ) B(p 1, q 2 ) } + { B(p 1, q 2 ) B(p 2, q 2 ) } The expression between the first pair of curly brackets is nonnegative since q 2 = ΠΠ p 1 The Threepoints lemma provides the lower bound B(p 1, p 2 ) for the second expression Qed To formulate the second monotonicity property, let P P be the set of all p o P such that (215) B(p o, Q) = B(P, Q) = inf { B(x, y) : x P, y Q } So P is the set of solutions p of the minimum distance problem (28) 10
11 (216) Second Monotonicity Property Let p P, and set q = ΠΠ p Select p 1 P such that B(p, p 1 ) < Then B(p, p 2 ) < as well, and B(p, p 1 ) B(p, p 2 ) Λ(q 2 ) Λ(q ) Proof The Fourpoints lemma, with x = p, y = q says that B(p, p 1 ) B(p, q 2 ) B(p, q ), and the Threepoints lemma, with the indices incremented by 1, gives B(p, p 2 ) B(p 2, q 2 ) B(p, q 2 ) Adding these two inequalities gives B(p, p 1 ) B(p, p 2 ) B(p 2, q 2 ) B(p, q ), which is the required inequality Qed The proof that the alternating proection method converges is now quite simple, modulo a rather annoying assumption In the fully general setting there appears to be no way around it In specific instances it is always easily verified (217) Theorem Let p 1 P such that B(p, p 1 ) < for all p P Then {p n } n converges to some p o P, and {q n } n converges to some q o Q, and B(p o, q o ) = min { B(p, q) : p P, q Q } Proof By the First Monotonicity Property, {Λ(q n )} n is decreasing Let p P, and let q = ΠΠ p By the Second Monotonicity Property {B(p, p n )} n is decreasing, and since it is a nonnegative sequence, it has a limit Again the Second Monotonicity Property then implies that Λ(q n ) Λ(q ) Also, from the boundedness of {B(p, p n )} n condition (B3) implies that {p n } n is bounded, so it has a convergent subsequence, denoted by {p n } n M where M N Let p o be the limit of this subsequence Now {q n } n M is bounded, so it too has convergent subsequences Without loss of generality, we may assume that {q n+1 } n M is convergent, say with limit q o By the lower semi continuity (B2) of B, then B(p o, q o ) lim inf n M B(p n, q n ) = lim inf n M Λ(q n) = Λ(q ), 11
12 where the lim inf n M denotes the liminf as n, n M It follows that p o P (and that q o = ΠΠ p o, but never mind) To prove the convergence of the whole sequences, apply the above with p replaced by p o (Here the strange condition that B(p, p 1 ) < for all p P comes into play) Then {B(p o, p n )} n is decreasing, and by (B5) a subsequence converges to 0 It follows that the whole sequence converges to 0, so p n p o, n (n N) Now, since {q n } n is bounded, every subsequence has itself a convergent subsequence Call the limit q (o) By the lower semi continuity of B(p, q), we get ust as above that B(p o, q (o) ) Λ(q ) It follows that q (o) = ΠΠ p o, and then that the whole sequence {q n } n converges to q (o) The last statement follows from p P, and p = Πq, so that Λ(q ) is equal to the distance between P and Q Qed (218) Remark It is interesting to note that the alternating proection method and the associated Three and Fourpoints property, as well as the two monotonicity properties work also for the problem minimize B(p, q) def = B(p, q) + F (q) subect to p P, q Q Here F is a differentiable convex function on Q Denoting the B 1 proection of q onto P by p = Π q, and the B 2 proection of p onto Q by q = ΠΠ p, the Three and Fourpoints properties read, resp B(p, q) B(Π q, q) B(p, Π q), B(x, ΠΠ p) B(x, y) B(x, p) Note the distinction between B and B This is especially interesting in the case where P = Q, since then one is minimizing F (p) over p For B(p, q) = KL(p, q) this leads the implicit algorithm discussed in Eggermont [12], viz (219) x k+1 = x k 1 + [ F (x k+1 ) ], = 1, 2,, m (220) Remark We note that the standard application of the theory is to minimizing KL(b, Ax), with nonnegative A R n m with column sums equal to 1, and nonnegative 12
13 b R n It is interesting to note that it also applies to minimizing KL(Ax, b) The resulting algorithm is (221) x k+1 = x k exp ( [ A T {log(b/ax k )} ] ), = 1, 2,, m, and the algorithm converges as per the general theory It is interesting to note that if Ax = b has a nonnegative solution then the algorithm (221), with x 1 positive vector, converges to the solution of = u, a strictly (222) minimize m x log x u + u x subect to x 0, Ax = b See Elfving [16] What happens when Ax = b does not have an exact nonnegative solution is not so easy, apparently 3 Least Hellinger distance estimation We now apply the alternating proections method to the minimum Hellinger distance estimation problem (11) Note that the Hellinger distance H(p, q) satisfies the properties (B1) through (B5), but is not of the form (21) (eg, the gradient is not of the required form) So the general theory of 2 does not tell us whether this alternating proection method converges or not The alternating proections setup is similar to the one employed for minimizing KL(b, Ax) by Csiszár and Tusnády [7], and for minimum Pearson s ϕ 2 distance employed by Mair, Rao and Anderson [19] Thus, let P and Q be defined as (31) P = { (p i ) R n m : p 0, Q = { (a i x ) R n m : x 0 }, p i = b i, i = 1, 2,, n } and consider the problem (32) minimize H(p, q) = p i q i 2 i subect to p P, q Q It is of course not clear why solutions to (32) should provide solutions to (31), but it will transpire that they do To determine the proection steps of the alternating proection method, let q 1 i = (a i x 1 ) Q be given The H 1proection of q 1 onto P is obtained by minimizing 13
14 H(p, q 1 ) over p P Ignoring the nonnegativity constraints on p, the Lagrange Multiplier Theorem yields that p should solve q 1 i 1 + λ i = 0, pi for suitable λ i, and hence p i = a ix 1 (1 + λ i ) 2 for all i,, This shows that we are ustified in ignoring the nonnegativity constraint on p Summing over results in b i = [Ax 1 ] i /(1 + λ i ) 2, and so, for all i, (33) p 1 i = a i b i x 1 [Ax 1 ] i The H 2 proection of p 1 onto Q is determined by minimizing H(p 1, q) = i ( i a i )x 2 x ( i a i p 1 i ) +, where denote terms independent of x Ignoring the nonnegativity constraint on x, and setting the gradient to 0 yields x = i ai p 1 i, or and q 2 x 2 = x 1 a i b i /[Ax 1 ] i i = (a i x 2 ) So we were ustified in ignoring the nonnegativity constraints, and the algorithm is (34) x 2 = x 1 [ A T {b/ax 1 } 1/2 ] 2, = 1, 2,, m, as advertised in the introduction The geometric intuition tells us that this algorithm converges In the next section we give an alternative derivation, and prove that is converges It is interesting to note that in all three minimum distance problems (KullbackLeibler, Pearson s ϕ 2 and Hellinger) the first proection step (33) is the same This begs for an explanation Indeed, all three functions KL(x, y), P(x, y) and H(x, y) may be written in the form (35) Ψ(x, y) = n 14 y ψ ( x /y ),
15 where ψ is an increasing, differentiable, convex function defined for nonnegative numbers The functions Ψ are referred to as entropy functions, see, eg, Chen and Teboulle [6], and references therein It can now be shown that for given q, q i = a i x, the solution p to the problem (36) minimize Ψ(p, q) p P, with P as in (31), is given by (37) p i = a i b i x [ Ax ] i It should be noted that Ψ satisfies the conditions (B1) through (B5), but again is not of the required form (21) It is not clear that a CsiszarTusnady theory could be worked for this family of functions (35) 4 Maorizing functions for Hellinger distance We now apply the maorizing function approach of De Pierro [11] to the minimum Hellinger distance estimation problem (11) Note that H(b, Ax) is convex in x We begin by deriving a maorizing function, or, as we like to call it, a Tendentious Inequality, because it will suggest the minimization algorithm We have (41) H(b, Ax) = n [Ax] i 2 b i [Ax] i + b i, so only the second term needs consideration convexity of the function t t that Writing Ax = A{y(x/y)}, we get by [ { } [A y (x/y) ]i [Ax] i = [Ay] i [Ay] i [A { y [ x/y ] } 1/2 ] i [Ay] i [Ay] i = [A{ xy } ]i [Ay]i ] 1 2 It follows that H(b, Ax) n [Ax] i 2 [A xy ] i bi [Ay]i + b i, 15
16 or, (42) H(b, Ax) H(x, y) def = m x 2 x y [ A T b/ay ] + n b i This is the Tendentious Inequality The minimization algorithm it suggests for solving (41) is as follows If y = x k is a guess for a solution of (41), obtain a new and improved(?) guess x k+1 by minimizing H(x, x k ) as function of x The result is that (43) x k+1 = x k ([ A T b/ax k ] ) 2, = 1, 2,, m We now investigate the monotonicity properties In our search for the First Monotonicity Property we observe the following For ease of notation we let y = x k and x = x k+1 Then H(b, Ax) H(x, y) = m H(b, Ay) m H(b, Ay) m The formulation in terms of x k and x k+1 reads y { [ A T b/ay ] } 2 + n y { 1 + [ A T b/ay ] } 2 = x y 2 (44) H(b, Ax k ) H(b, Ax k+1 ) H(x k, x k+1 ), which is the First Monotonicity Property Note the lack of any hint of KullbackLeibler But KullbackLeibler pops up in the Second Monotonicity Property It turns out that the Second Monotonicity Property takes ust about the standard form Let x be a solution of (41) By (45) then x is a fixed point of the iteration (44), so [A T b/ax ] = 1 whenever x > 0 Now, with KL the standard KullbackLeibler divergence, b i = (45) KL(x, x k ) KL(x, x k+1 ) = m = m x log xk+1 x k + x k x k+1 x k x k x log[ A T b/ax k ] In the usual fashion we have [ A T b/ax k ] = [ A T { 16 b Ax Ax Ax k } ],
17 and so, by the concavity of the logarithm (46) KL(x, x k ) KL(x, x k+1 ) m x k x k x [ A { b Ax T Ax log } ] Ax k m m x k x k+1 + x k x k+1 + n n 2 b i [Ax ] i log [Ax ] i [Ax k ] i 2 b i [Ax ] i 2 b i [Ax k ] i, where in the last line we used the inequality log t 1 t 1 Consequently (47) KL(x, x k ) KL(x, x k+1 ) n { [Ax k } ] i 2 b i [Ax k ] i + b i n { [Ax } ] i 2 b i [Ax ] i + b i + rest H(b, Ax k ) H(b, Ax ) + rest H(b, Ax k+1 ) + H(x k+1, x k ) H(b, Ax ) + rest, where in the last line we used (44), and the rest is given by (48) rest = n [Ax ] i m Now, x k+1 = m x x k+1 (49) where H(x k+1, x k ) + rest = m = m = n x k+1 + x k 2 x k+1 x k + x x k+1 ] x k 2x k [A b/ax T k + x = { [Ax k } ] i 2 b i [Ax k ] i + b i + rem = H(b, Ax k ) + rem, rem = m 17 x n b i =
18 Rather surprising, rem = H(b, Ax ), as we now show Since x is a fixed point of (43), and so (410) It follows that m x = m rem = m = n x [ A T b/ax ] = n x + m 2x n b i b i [Ax ] i, [Ax ] i + 2 b i [Ax ] i b i = H(b, Ax ) (411) KL(x, x k ) KL(x, x k+1 ) H(b, Ax k+1 ) + H(b, Ax k ) 2 H(b, Ax ) 0, which implies (412) KL(x, x k ) KL(x, x k+1 ) 2 { H(b, Ax k+1 ) H(b, Ax ) } 0 Either (411) or (412) may be considered as the Second Monotonicity Property The maorizing function approach applies to the smoothed minimum Hellinger distance problem (119) At the end of this section we show one may view this as a regularized version of (11) Note that H(b, A, x) is convex The Tendentious Inequality is (413) H(b, A, x) n b i + m x 2 x [ S{ My A T b / AMy } ], which gives rise to the algorithm (120) The first monotonicity property of Theorem (121) is similar to the unsmoothed case For the second monotonicity property we work backwards, in several steps analogous to the unsmoothed case The first ingredient is the observation that for any solution x of (119) m (414) x n b i = H(b, A, x ) The proof is ust about the same as before : since x is a fixed point of (120) With r = b / A T Mx, m x = m x [ S( M x (A T r ) ) ] = m [ Mx ] [ A T r ] = n [ AMx ] i ri = n bi [ AMx ] i, 18
19 where we used duality twice (or interchanging the order of summation) Now (414) follows as in (410) The second step is to show that (415) H(b, A, x k ) H(b, A, x ) = H(x k, x k+1, x ) + m x x k+1 This too follows similarly to the unsmoothed case : using (414) we have H(b, A, x k ) H(b, A, x ) = H(b, A, x k ) + m and now, with r k = b / AMx k, n Going back we get and (415) follows b i [ AMx k ] = m x k + x n = n [ AMx k ] r k = m [ Mx k ] [ A T r k ] = m = m = m x n b i 2 b i [ AMx k ] i, [ S x k ] [ Mx k ] [ A T r k ] [ x k ] [ S ( Mxk (A T r k ) ) ] x k xk+1 H(b, A, x k ) H(b, A, x ) = m Now backtracking as in (47) (46) (45) we get that H(b, A, x k+1 ) + H(x k+1, x k ) H(b, A, x ) + m m m x k x k+1 x k x k+1 + n + n m x k 2 x k xk+1 + x, x x k+1 2 ( ) bi [ AMx ] i b i [ AMx k ] i 2 b i [ AMx ] i log x k x k+1 19 [ AMx ] i [ AMxk ] i + 2 [ Mx ] [ A T r ] log [ AT r k ] [ A T r ]
20 So we now have (416) H(b, A, x k+1 ) + H(x k+1, x k ) H(b, A, x ) + m with (417) SUM = n x x k+1 m 2 [ Mx ] [ A T r ] log [ AT r k ] [ A T r ] x k x k+1 + SUM, The last step is to get from here to KL(x, x k ) KL(x, x k+1 ) We rewrite SUM as SUM = SUM I + SUM II, with SUM I = n SUM II = n With arguments used before, SUM I = m 2 [ Mx ] [ A T r ] log [ Mx k ] [ A T r k ] [ Mx ] [ A T r ], 2 x 2 [ Mx ] [ A T r ] log [ Mx ] [ Mx k ], [ S{ Mx (A T r ) log Mxk A T r k Mx A T r } ], and, now, in view of the iteration (120), of which x is a fixed point S inv x k+1 = Mx k A T r k, S inv x = Mx A T r, assuming that S is invertible (The following goes through without this assumption, actually) So we may write SUM I as (418) SUM I = m 2 x [ { S (S inv x x ) log Sinv k+1 S inv x } ] It should be noted that S inv x k+1 and S inv x are nonnegative vectors Now, for any nonnegative function U, by the concavity of the logarithm S ( (S inv x ) log U ) S ( S inv x ) log S ( (S inv x ) U ) S ( S inv x ), 20
Adaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More information2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
More informationNumerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J. Olver 5. Inner Products and Norms The norm of a vector is a measure of its size. Besides the familiar Euclidean norm based on the dot product, there are a number
More informationPUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include 2 + 5.
PUTNAM TRAINING POLYNOMIALS (Last updated: November 17, 2015) Remark. This is a list of exercises on polynomials. Miguel A. Lerma Exercises 1. Find a polynomial with integral coefficients whose zeros include
More information24. The Branch and Bound Method
24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NPcomplete. Then one can conclude according to the present state of science that no
More informationFACTORING POLYNOMIALS IN THE RING OF FORMAL POWER SERIES OVER Z
FACTORING POLYNOMIALS IN THE RING OF FORMAL POWER SERIES OVER Z DANIEL BIRMAJER, JUAN B GIL, AND MICHAEL WEINER Abstract We consider polynomials with integer coefficients and discuss their factorization
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More informationIntroduction to Convex Optimization for Machine Learning
Introduction to Convex Optimization for Machine Learning John Duchi University of California, Berkeley Practical Machine Learning, Fall 2009 Duchi (UC Berkeley) Convex Optimization for Machine Learning
More informationTHE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS
THE FUNDAMENTAL THEOREM OF ALGEBRA VIA PROPER MAPS KEITH CONRAD 1. Introduction The Fundamental Theorem of Algebra says every nonconstant polynomial with complex coefficients can be factored into linear
More informationCost Minimization and the Cost Function
Cost Minimization and the Cost Function Juan Manuel Puerta October 5, 2009 So far we focused on profit maximization, we could look at a different problem, that is the cost minimization problem. This is
More informationMetric Spaces. Chapter 1
Chapter 1 Metric Spaces Many of the arguments you have seen in several variable calculus are almost identical to the corresponding arguments in one variable calculus, especially arguments concerning convergence
More informationconstraint. Let us penalize ourselves for making the constraint too big. We end up with a
Chapter 4 Constrained Optimization 4.1 Equality Constraints (Lagrangians) Suppose we have a problem: Maximize 5, (x 1, 2) 2, 2(x 2, 1) 2 subject to x 1 +4x 2 =3 If we ignore the constraint, we get the
More informationMath 4310 Handout  Quotient Vector Spaces
Math 4310 Handout  Quotient Vector Spaces Dan Collins The textbook defines a subspace of a vector space in Chapter 4, but it avoids ever discussing the notion of a quotient space. This is understandable
More informationDate: April 12, 2001. Contents
2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........
More informationFactorization Theorems
Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationCHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e.
CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e. This chapter contains the beginnings of the most important, and probably the most subtle, notion in mathematical analysis, i.e.,
More informationInner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
More information1 if 1 x 0 1 if 0 x 1
Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationMOP 2007 Black Group Integer Polynomials Yufei Zhao. Integer Polynomials. June 29, 2007 Yufei Zhao yufeiz@mit.edu
Integer Polynomials June 9, 007 Yufei Zhao yufeiz@mit.edu We will use Z[x] to denote the ring of polynomials with integer coefficients. We begin by summarizing some of the common approaches used in dealing
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationAlgebra Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the 201213 school year.
This document is designed to help North Carolina educators teach the Common Core (Standard Course of Study). NCDPI staff are continually updating and improving these tools to better serve teachers. Algebra
More informationStochastic Inventory Control
Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the
More information1 Portfolio mean and variance
Copyright c 2005 by Karl Sigman Portfolio mean and variance Here we study the performance of a oneperiod investment X 0 > 0 (dollars) shared among several different assets. Our criterion for measuring
More informationx1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.
Cross product 1 Chapter 7 Cross product We are getting ready to study integration in several variables. Until now we have been doing only differential calculus. One outcome of this study will be our ability
More informationLinear Algebra Notes for Marsden and Tromba Vector Calculus
Linear Algebra Notes for Marsden and Tromba Vector Calculus ndimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of
More informationContinuity of the Perron Root
Linear and Multilinear Algebra http://dx.doi.org/10.1080/03081087.2014.934233 ArXiv: 1407.7564 (http://arxiv.org/abs/1407.7564) Continuity of the Perron Root Carl D. Meyer Department of Mathematics, North
More informationNotes on Factoring. MA 206 Kurt Bryan
The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor
More informationt := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).
1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction
More information(Quasi)Newton methods
(Quasi)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable nonlinear function g, x such that g(x) = 0, where g : R n R n. Given a starting
More informationMathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson
Mathematics for Computer Science/Software Engineering Notes for the course MSM1F3 Dr. R. A. Wilson October 1996 Chapter 1 Logic Lecture no. 1. We introduce the concept of a proposition, which is a statement
More informationElementary Number Theory We begin with a bit of elementary number theory, which is concerned
CONSTRUCTION OF THE FINITE FIELDS Z p S. R. DOTY Elementary Number Theory We begin with a bit of elementary number theory, which is concerned solely with questions about the set of integers Z = {0, ±1,
More informationQuotient Rings and Field Extensions
Chapter 5 Quotient Rings and Field Extensions In this chapter we describe a method for producing field extension of a given field. If F is a field, then a field extension is a field K that contains F.
More informationTHREE DIMENSIONAL GEOMETRY
Chapter 8 THREE DIMENSIONAL GEOMETRY 8.1 Introduction In this chapter we present a vector algebra approach to three dimensional geometry. The aim is to present standard properties of lines and planes,
More informationChapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm.
Chapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm. We begin by defining the ring of polynomials with coefficients in a ring R. After some preliminary results, we specialize
More informationLinear Programming Notes V Problem Transformations
Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material
More informationExercises with solutions (1)
Exercises with solutions (). Investigate the relationship between independence and correlation. (a) Two random variables X and Y are said to be correlated if and only if their covariance C XY is not equal
More information10. Proximal point method
L. Vandenberghe EE236C Spring 201314) 10. Proximal point method proximal point method augmented Lagrangian method MoreauYosida smoothing 101 Proximal point method a conceptual algorithm for minimizing
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationCONTINUED FRACTIONS AND FACTORING. Niels Lauritzen
CONTINUED FRACTIONS AND FACTORING Niels Lauritzen ii NIELS LAURITZEN DEPARTMENT OF MATHEMATICAL SCIENCES UNIVERSITY OF AARHUS, DENMARK EMAIL: niels@imf.au.dk URL: http://home.imf.au.dk/niels/ Contents
More information3. INNER PRODUCT SPACES
. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.
More information1 Sets and Set Notation.
LINEAR ALGEBRA MATH 27.6 SPRING 23 (COHEN) LECTURE NOTES Sets and Set Notation. Definition (Naive Definition of a Set). A set is any collection of objects, called the elements of that set. We will most
More informationIncreasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.
1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.
More informationNo: 10 04. Bilkent University. Monotonic Extension. Farhad Husseinov. Discussion Papers. Department of Economics
No: 10 04 Bilkent University Monotonic Extension Farhad Husseinov Discussion Papers Department of Economics The Discussion Papers of the Department of Economics are intended to make the initial results
More informationBipan Hazarika ON ACCELERATION CONVERGENCE OF MULTIPLE SEQUENCES. 1. Introduction
F A S C I C U L I M A T H E M A T I C I Nr 51 2013 Bipan Hazarika ON ACCELERATION CONVERGENCE OF MULTIPLE SEQUENCES Abstract. In this article the notion of acceleration convergence of double sequences
More informationCalculus C/Multivariate Calculus Advanced Placement G/T Essential Curriculum
Calculus C/Multivariate Calculus Advanced Placement G/T Essential Curriculum UNIT I: The Hyperbolic Functions basic calculus concepts, including techniques for curve sketching, exponential and logarithmic
More informationThe equivalence of logistic regression and maximum entropy models
The equivalence of logistic regression and maximum entropy models John Mount September 23, 20 Abstract As our colleague so aptly demonstrated ( http://www.winvector.com/blog/20/09/thesimplerderivationoflogisticregression/
More informationThe van Hoeij Algorithm for Factoring Polynomials
The van Hoeij Algorithm for Factoring Polynomials Jürgen Klüners Abstract In this survey we report about a new algorithm for factoring polynomials due to Mark van Hoeij. The main idea is that the combinatorial
More informationNonlinear Optimization: Algorithms 3: Interiorpoint methods
Nonlinear Optimization: Algorithms 3: Interiorpoint methods INSEAD, Spring 2006 JeanPhilippe Vert Ecole des Mines de Paris JeanPhilippe.Vert@mines.org Nonlinear optimization c 2006 JeanPhilippe Vert,
More informationMathematical Methods of Engineering Analysis
Mathematical Methods of Engineering Analysis Erhan Çinlar Robert J. Vanderbei February 2, 2000 Contents Sets and Functions 1 1 Sets................................... 1 Subsets.............................
More informationOn the representability of the biuniform matroid
On the representability of the biuniform matroid Simeon Ball, Carles Padró, Zsuzsa Weiner and Chaoping Xing August 3, 2012 Abstract Every biuniform matroid is representable over all sufficiently large
More informationA FIRST COURSE IN OPTIMIZATION THEORY
A FIRST COURSE IN OPTIMIZATION THEORY RANGARAJAN K. SUNDARAM New York University CAMBRIDGE UNIVERSITY PRESS Contents Preface Acknowledgements page xiii xvii 1 Mathematical Preliminaries 1 1.1 Notation
More informationSection 1.1. Introduction to R n
The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to
More informationLinear Algebra I. Ronald van Luijk, 2012
Linear Algebra I Ronald van Luijk, 2012 With many parts from Linear Algebra I by Michael Stoll, 2007 Contents 1. Vector spaces 3 1.1. Examples 3 1.2. Fields 4 1.3. The field of complex numbers. 6 1.4.
More informationPowerTeaching i3: Algebra I Mathematics
PowerTeaching i3: Algebra I Mathematics Alignment to the Common Core State Standards for Mathematics Standards for Mathematical Practice and Standards for Mathematical Content for Algebra I Key Ideas and
More informationPrime Numbers and Irreducible Polynomials
Prime Numbers and Irreducible Polynomials M. Ram Murty The similarity between prime numbers and irreducible polynomials has been a dominant theme in the development of number theory and algebraic geometry.
More informationEMBEDDING COUNTABLE PARTIAL ORDERINGS IN THE DEGREES
EMBEDDING COUNTABLE PARTIAL ORDERINGS IN THE ENUMERATION DEGREES AND THE ωenumeration DEGREES MARIYA I. SOSKOVA AND IVAN N. SOSKOV 1. Introduction One of the most basic measures of the complexity of a
More informationFurther Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1
Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing
More informationLecture 7: Finding Lyapunov Functions 1
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.243j (Fall 2003): DYNAMICS OF NONLINEAR SYSTEMS by A. Megretski Lecture 7: Finding Lyapunov Functions 1
More informationSome Notes on Taylor Polynomials and Taylor Series
Some Notes on Taylor Polynomials and Taylor Series Mark MacLean October 3, 27 UBC s courses MATH /8 and MATH introduce students to the ideas of Taylor polynomials and Taylor series in a fairly limited
More informationWe call this set an ndimensional parallelogram (with one vertex 0). We also refer to the vectors x 1,..., x n as the edges of P.
Volumes of parallelograms 1 Chapter 8 Volumes of parallelograms In the present short chapter we are going to discuss the elementary geometrical objects which we call parallelograms. These are going to
More information1 Short Introduction to Time Series
ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The
More informationThe Ideal Class Group
Chapter 5 The Ideal Class Group We will use Minkowski theory, which belongs to the general area of geometry of numbers, to gain insight into the ideal class group of a number field. We have already mentioned
More informationIdeal Class Group and Units
Chapter 4 Ideal Class Group and Units We are now interested in understanding two aspects of ring of integers of number fields: how principal they are (that is, what is the proportion of principal ideals
More informationOn Adaboost and Optimal Betting Strategies
On Adaboost and Optimal Betting Strategies Pasquale Malacaria School of Electronic Engineering and Computer Science Queen Mary, University of London Email: pm@dcs.qmul.ac.uk Fabrizio Smeraldi School of
More informationWHAT ARE MATHEMATICAL PROOFS AND WHY THEY ARE IMPORTANT?
WHAT ARE MATHEMATICAL PROOFS AND WHY THEY ARE IMPORTANT? introduction Many students seem to have trouble with the notion of a mathematical proof. People that come to a course like Math 216, who certainly
More informationMetric Spaces. Chapter 7. 7.1. Metrics
Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some
More information9. POLYNOMIALS. Example 1: The expression a(x) = x 3 4x 2 + 7x 11 is a polynomial in x. The coefficients of a(x) are the numbers 1, 4, 7, 11.
9. POLYNOMIALS 9.1. Definition of a Polynomial A polynomial is an expression of the form: a(x) = a n x n + a n1 x n1 +... + a 1 x + a 0. The symbol x is called an indeterminate and simply plays the role
More informationdiscuss how to describe points, lines and planes in 3 space.
Chapter 2 3 Space: lines and planes In this chapter we discuss how to describe points, lines and planes in 3 space. introduce the language of vectors. discuss various matters concerning the relative position
More informationTable 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass.
Online PassiveAggressive Algorithms Koby Crammer Ofer Dekel Shai ShalevShwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationWhat is Linear Programming?
Chapter 1 What is Linear Programming? An optimization problem usually has three essential ingredients: a variable vector x consisting of a set of unknowns to be determined, an objective function of x to
More informationIntroduction to Algebraic Geometry. Bézout s Theorem and Inflection Points
Introduction to Algebraic Geometry Bézout s Theorem and Inflection Points 1. The resultant. Let K be a field. Then the polynomial ring K[x] is a unique factorisation domain (UFD). Another example of a
More informationFactoring Patterns in the Gaussian Plane
Factoring Patterns in the Gaussian Plane Steve Phelps Introduction This paper describes discoveries made at the Park City Mathematics Institute, 00, as well as some proofs. Before the summer I understood
More informationContinued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
More informationU.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009. Notes on Algebra
U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009 Notes on Algebra These notes contain as little theory as possible, and most results are stated without proof. Any introductory
More informationGROUPS ACTING ON A SET
GROUPS ACTING ON A SET MATH 435 SPRING 2012 NOTES FROM FEBRUARY 27TH, 2012 1. Left group actions Definition 1.1. Suppose that G is a group and S is a set. A left (group) action of G on S is a rule for
More informationCHAPTER SIX IRREDUCIBILITY AND FACTORIZATION 1. BASIC DIVISIBILITY THEORY
January 10, 2010 CHAPTER SIX IRREDUCIBILITY AND FACTORIZATION 1. BASIC DIVISIBILITY THEORY The set of polynomials over a field F is a ring, whose structure shares with the ring of integers many characteristics.
More informationDuality of linear conic problems
Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least
More informationChapter 21: The Discounted Utility Model
Chapter 21: The Discounted Utility Model 21.1: Introduction This is an important chapter in that it introduces, and explores the implications of, an empirically relevant utility function representing intertemporal
More informationA Negative Result Concerning Explicit Matrices With The Restricted Isometry Property
A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property Venkat Chandar March 1, 2008 Abstract In this note, we prove that matrices whose entries are all 0 or 1 cannot achieve
More informationLINEAR ALGEBRA W W L CHEN
LINEAR ALGEBRA W W L CHEN c W W L Chen, 1997, 2008 This chapter is available free to all individuals, on understanding that it is not to be used for financial gain, and may be downloaded and/or photocopied,
More informationYou know from calculus that functions play a fundamental role in mathematics.
CHPTER 12 Functions You know from calculus that functions play a fundamental role in mathematics. You likely view a function as a kind of formula that describes a relationship between two (or more) quantities.
More informationUnderstanding Basic Calculus
Understanding Basic Calculus S.K. Chung Dedicated to all the people who have helped me in my life. i Preface This book is a revised and expanded version of the lecture notes for Basic Calculus and other
More informationINTRODUCTORY SET THEORY
M.Sc. program in mathematics INTRODUCTORY SET THEORY Katalin Károlyi Department of Applied Analysis, Eötvös Loránd University H1088 Budapest, Múzeum krt. 68. CONTENTS 1. SETS Set, equal sets, subset,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationLinear algebra and the geometry of quadratic equations. Similarity transformations and orthogonal matrices
MATH 30 Differential Equations Spring 006 Linear algebra and the geometry of quadratic equations Similarity transformations and orthogonal matrices First, some things to recall from linear algebra Two
More informationLinear Codes. Chapter 3. 3.1 Basics
Chapter 3 Linear Codes In order to define codes that we can encode and decode efficiently, we add more structure to the codespace. We shall be mainly interested in linear codes. A linear code of length
More informationApplications to Data Smoothing and Image Processing I
Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is
More informationRevised Version of Chapter 23. We learned long ago how to solve linear congruences. ax c (mod m)
Chapter 23 Squares Modulo p Revised Version of Chapter 23 We learned long ago how to solve linear congruences ax c (mod m) (see Chapter 8). It s now time to take the plunge and move on to quadratic equations.
More informationMATH10212 Linear Algebra. Systems of Linear Equations. Definition. An ndimensional vector is a row or a column of n numbers (or letters): a 1.
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0534405967. Systems of Linear Equations Definition. An ndimensional vector is a row or a column
More information5 Numerical Differentiation
D. Levy 5 Numerical Differentiation 5. Basic Concepts This chapter deals with numerical approximations of derivatives. The first questions that comes up to mind is: why do we need to approximate derivatives
More informationISOMETRIES OF R n KEITH CONRAD
ISOMETRIES OF R n KEITH CONRAD 1. Introduction An isometry of R n is a function h: R n R n that preserves the distance between vectors: h(v) h(w) = v w for all v and w in R n, where (x 1,..., x n ) = x
More informationA Branch and Bound Algorithm for Solving the Binary Bilevel Linear Programming Problem
A Branch and Bound Algorithm for Solving the Binary Bilevel Linear Programming Problem John Karlof and Peter Hocking Mathematics and Statistics Department University of North Carolina Wilmington Wilmington,
More informationLinear Programming I
Linear Programming I November 30, 2003 1 Introduction In the VCR/guns/nuclear bombs/napkins/star wars/professors/butter/mice problem, the benevolent dictator, Bigus Piguinus, of south Antarctica penguins
More informationTaylor Polynomials and Taylor Series Math 126
Taylor Polynomials and Taylor Series Math 26 In many problems in science and engineering we have a function f(x) which is too complicated to answer the questions we d like to ask. In this chapter, we will
More informationA QUICK GUIDE TO THE FORMULAS OF MULTIVARIABLE CALCULUS
A QUIK GUIDE TO THE FOMULAS OF MULTIVAIABLE ALULUS ontents 1. Analytic Geometry 2 1.1. Definition of a Vector 2 1.2. Scalar Product 2 1.3. Properties of the Scalar Product 2 1.4. Length and Unit Vectors
More informationA NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION
1 A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION Dimitri Bertsekas M.I.T. FEBRUARY 2003 2 OUTLINE Convexity issues in optimization Historical remarks Our treatment of the subject Three unifying lines of
More information3. Linear Programming and Polyhedral Combinatorics
Massachusetts Institute of Technology Handout 6 18.433: Combinatorial Optimization February 20th, 2009 Michel X. Goemans 3. Linear Programming and Polyhedral Combinatorics Summary of what was seen in the
More informationInference of Probability Distributions for Trust and Security applications
Inference of Probability Distributions for Trust and Security applications Vladimiro Sassone Based on joint work with Mogens Nielsen & Catuscia Palamidessi Outline 2 Outline Motivations 2 Outline Motivations
More information