Derivations for Linear Algebra and Optimization

Size: px
Start display at page:

Download "Derivations for Linear Algebra and Optimization"

Transcription

1 Derivations for Linear Algebra and Optimization John Duchi Contents 1 Householder Transformations 1 Convex Sets and Local/Global Minima.1 Local Minima as Global Minima Uniqueness of Global Minima Second Order Convexity Conditions Multivariate case Iterative Methods for Optimization Steepest Decent Digression on Choosing Directions Conjugate Gradient Perceptron Convergence 10 6 Duality of SVMs 11 7 Projection of a vector onto linear constraints 1 8 Minimization of a linear function subject to elliptical constraints 1 9 KL Divergence for Gaussian distributions 13 1 Householder Transformations Much of this section was copied and paraphrased from Heath s Scientific Computing. Anyways. Suppose we are looking for an orthogonal transformation that annihilates desired components of a given vector. Recall that a square real matrix Q is said to be orthogonal if its columns are orthonormal, that is, that Q T Q = I. Orthogonal transformations are nice because they preserve Euclidean norms of any vectors v, as follows: Qv = (Qv) T Qv = v T Q T Qv = v T v = v These sorts of transformations are nice, because when applied to any linear system, they maintain Euclidean norms and won t penalize the difficulty of solving the problem numerically. Now, we are really looking for an orthogonal transformation of a vector that annihilates its components while maintaining the two-norm (or Euclidean norm). We can accomplish this with a Householder transformation, which is a matrix of the form H = I vvt v T v. 1

2 Obviously, v is a non-zero vector. From our definition, it is easy to see that H = H T, and a little math gives us that HH = (I vvt v T )(I vvt v v T v ) = I 4vvT v T v + 4vvT v T v vvt v T v = I To derive the full Householder, we first consider the problem of taking a vector a and annihilating all of its components but the first, giving us a vector αe 1, in such a way that a = α. Basically, we would like to have α Ha = Using the original formula for H, we have so that. 0 = α. 0 = a e 1 αe 1 = Ha = a vvt v T v a v = (a αe 1 ) vt v v T a = (a ± a e 1 ) C(v, a) Because the constant C term will be factored out by division when we determine the actual transformation H, we can eliminate it, giving us that v = a αe 1 In general, to avoid cancellation (whatever that means), we choose α = sign(a 1 ) a. In the more general case, when actually working with a system that we would like to transform orthogonally into an upper-triangular system (for easier solving), we would like to take an m-vector a and annihilate its last m k components. That is, we would like to take a vector [ ] a1 a = a where a 1 is a k 1-vector and a is an m k + 1-vector, and transform it into a vector which contains a 1 as its first k 1 elements, as its k th element we would like a, and the remaining m k elements we would like to be zeros. Here is the derivation of the vector v to be used in the Householder transformation for this case: [ ] ) [ ] [ ] a1 Ha = + αe 0 k = (I vvt a1 a1 v T = vvt v a a v T v a Subtracting a from both sides and multiplying by negative one, we have [ ] 0 αe a k = v vt a v T = v C(v, a) v and again we can eliminate the constant term to give us that we should set v to be [ ] 0 v = a a e k From the above discussion, it is fairly easy to see that given an m n matrix A, if we apply n Householder transformations to eliminate the below diagonal elements for A, we will get a matrix R such that [ ] R H n H n 1 H 1 A =. 0 For further explication and use of the Householder, see Heath.

3 Convex Sets and Local/Global Minima The definition of a convex set is a set S R n which contains the line segment between any two of its points. More specifically, if S is convex, then for any x, y S, α [0, 1], αx + (1 α)y S. A function f : S R n R is convex on a convex set S if its value along any line segment in S lies on or below the chord connecting the function values at the endpoints of the segment. That is, f(αx + (1 αy)) αf(x) + (1 α)f(y).1 Local Minima as Global Minima One really nice property of convex sets is that any local minimum of a convex function f on a convex set S is a global minimum of f on S. To see this, consider the definition of a local minimum: f(x) < f(x + ε) for some small ε, which states that x S is the smallest value f takes on in some ball of radius ε = ε on S. Now, we consider some other point y S. Assume, for the sake of contradiction, that f(y) < f(x), which means that x is not the global minimum of f on S. By the definition of convexity, the chord connecting x and y lies completely in S. Thus, we can assert that there is a point on the chord αx + (1 α)y that lies inside the ball, that is, there is some α r such that α r x + (1 α r )y x < ε Consider some small α r (0, 1), and let α r x + (1 α r )y = x + ε r, which is a point inside the ball x + ε. Now, consider that f(y) < f(αx + (1 α)y) (1) αf(x) + (1 α)f(y) () αf(x) + (1 α)f(x) = f(x) where equation (1) follows because y is, by our assumption, the global minimum of f on S, and equation () follows because f is convex on S. Note that the above implies that for any α [0, 1), we have that αf(x) + (1 α)f(y) < f(x), specifically, for α = α r. We saw above, though, that for α r, x + ε r was inside the ball given by x + ε, so that f(x + ε r ) > f(x). This is a contradiction, however, because we saw from equation () that αf(x) + (1 α)f(y) f(x). Our assumption that f(y) < f(x) must thus be false, and we have that if x is a local minimum of f on S, then x is a global minimum of f on S.. Uniqueness of Global Minima If we have a function f which is strictly convex on a convex set S, that is, f(αx + (1 α)y) < αf(x) + (1 α)f(y) for α (0, 1), then a local minimum of f is the unique global minimum of f on S. To see this, suppose we have two distinct points, x and y, which are local minima of f on S. By the above result, if x and y existed, we know that they are both global minima, so that f(x) = f(y). This means, however, that because f is strictly convex, f(αx + (1 α)y) < αf(x) + (1 α)f(y) = αf(x) + (1 α)f(x) = f(x) This is a contradiction because x is a global minima of f on S, so our assumption that y also existed must be false. Thus, if a function f is strictly convex on a convex set S, any local minimum of f is the unique global minimum of f on S. 3

4 3 Second Order Convexity Conditions This is a problem from Boyd s book on convex optimization. Suppose we have a function f : R R, which is convex, and a, b dom f, with a < b. First, we consider x [a, b]. If θ [0, 1], we know that f(θx + (1 θx)) θf(x) + (1 θ)f(x). Because this is the case, and because x = θa + (1 θ)b for some θ for any x [a, b], it is easy to see that θ = x b a b = b x b a. Thus, f(x) = f(θa + (1 θ)b) θf(a) + (1 θ)f(b) = b x b a f(a) + x a b a f(b) Using this, we can prove some more interesting things. First, we have Similarly, f(x) + x b b a f(a) x a b a f(b) (b a)f(x) bf(a) (x a)f(b) xf(a) (b a)f(x) bf(a) + af(a) (x a)f(b) xf(a) + af(a) [f(x) f(a)] (b a) (x a)[f(b) f(a)] f(x) f(a) f(b) f(a) x a b a (x b)f(a) (x a)f(b) f(x)(b a) (b x)f(b) + (x b)f(a) (b a)f(b) (b a)f(x) [f(b) f(a)] (b x) (b a)[f(b) f(x)] f(b) f(a) f(b) f(x) b a b x If we suppose that f is differentiable, taking each side of the two inequalities in turn, we have (by L Hôpital s rule) that f(x) f(a) f (x) lim = lim = f f(b) f(a) f(b) f(x) (a) lim = f (b). x a x a x a 1 b a x b b x While we used the above inequalities to prove this, it also easily follows from the gradient of a convex function f being a global underestimator of f (that is f(x) f(y) + f (y)(x y)). Lastly, we assume that f is twice differentiable. Using the above result, that f (a) we see that f (b) f (a) 0, so that f (y) f (x) y x f(b) f(a) b a f (y) f (x) f (y) d dy lim = lim f (x) y x y x y x 1 d f (b) 0 as long as y > x, and we then have dy x = lim y x f (y) = f (x) 0. Thus we see that if f is convex and twice differentiable, f (x) 0. As for the opposite direction, we need simply use the fact that f is twice differentiable (so that f is integrable) and we can prove the converse. So suppose that f (u) 0 for all u in dom f. It easily follows that if y > z, we have that f (y) f (z), because f (u) 0 implies that y z f (u)du = f (y) f (z) 4 y z 0 du = 0

5 Now we simply take the integral from x to y of our result (assuming y x, though the negative signs just change if the opposite is the case), seeing y x f (y) f (z) 0 f (y) f (z)dz f (y)z y x f(z) y x 0 f (y)(y x) f(y) + f(x) 0 y x 0 dz f(x) f(y) + f (y)(x y) This shows that a tangent to f is a global underestimator of f, so f is convex if f (x) 0 for x dom f. In the end, we see that a function f is convex if and only if f (x) Multivariate case In the multivariate case, we would like to prove that if the hessian f of a function f : R n R is positive semi-definite, that is, f 0 or x T fx 0, that f is convex. First, suppose that f 0. Now, consider the single variable function g that restricts f to some arbitrary line in f s domain, that is, dom g = [0, 1], and g(t) = f(tx + (1 t)y) for x, y dom f. We now take the first and second derivatives of g with respect to t: d dt g(t) = f(tx + (1 t)y)t (x y) d dt g(t) = (x y)t f(tx + (1 t)y)(x y) g (t) 0. Now, suppose that f is convex, and again restrict it to a line using g. Because g is the intersection of a convex set with f, g is convex and hence g (t) 0. Thus, we have that for all x, y dom f, g (t) = (x y) T f(tx + (1 t)y)(x y) 0. As near as I can tell, we have proved that a twice differentiable function f : R n R with open, convex domain is convex if and only if f 0. 4 Iterative Methods for Optimization In this section, we will be considering the problem of minimizing some function f : R n R. 4.1 Steepest Decent In general, we know that f(x) is the direction of the fastest ascent for the function f(x). As such, to minimize the function f(x), we would like to move our solution in the f direction. For steepest descent, then, we define the update operator for our estimate x k+1 at step k + 1 to be x k+1 = x k α f(x k ). 5

6 To make this a smarter method, we would like to take the largest steps possible at each step of the search, so we would like to travel in the f(x) direction until f(x) is small (or zero). To this end, we can define a function x k (t) = x k f(x k )t, so that we would like to minimize f(x k (t)) = f(x k f(x k ) t) = g(t), which is simply a one-dimensional minimization problem we can solve using a line search. It is easy to see that if we minimize g(t), we have traveled as far along the path of f(x k ) as we can without making our function grow again. As a more specific example, which can be enlightening, consider the function f defined as follows (where A is symmetric and positive definite) whose derivative, it is easy to see, is f(x) = 1 xt Ax b T x + c f(x) = 1 Ax + 1 AT x b = Ax b Thus, to solve this minimization problem, we must solve Ax = b. For steepest descent, we know that our search direction is f(x) = b Ax, and we also know that the residual for any potential solution x is given by r = b Ax = Ax exact Ax = A(x exact x) = Ae error Unfortunately, we very rarely know anything about our error e, but we do know that in steepest descent we would like to go in the direction r until we are parallel to the level curves of our function, which is (as stated above) a one dimensional minimization problem. That is, we would like to have f(x k ) s k, where s k is the search direction at iteration k of steepest descent. This implies that f(x) T s k = 0 if and only if f(x k+1 ) T s k = 0. Recognizing that r k = b Ax k, we can solve for our updates as follows: 0 = r T k+1r k = (b Ax k+1 ) T r k = (b A(x k + αs k )) T r k = (b Ax k αar k ) T r k = (b Ax k ) T r k α(ar k ) T r k = r T k r k αr T k Ar k In the end, this gives that α = rt k r k r T k Ar k. In short, here is the steepest descent method used to find the solution to the equation Ax = b: The more general formulation is r k = b Ax k α = rt k r k r T k Ar k x k+1 = x k + αr k α = argmin f(x k f(x k ) t) t x k+1 = x k α f(x k ) 6

7 4.1.1 Digression on Choosing Directions As a thought experiment, suppose our initial guess had an error e = x x exact, where e is an eigenvector of A. This would give us that With the above, we see that r = b Ax = Ax exact Ax = A(x exact x) = Ae = λe x k+1 = x k + rt k r k r T k Ar r k k = x k + rt k r k r T ( λe) k A( λe) = x k + rt k r k λr T k ee = x k + rt k r k r T k r e k = x k e = x k (x k x exact ) = x exact What this tells us is that if we can pick an eigenvector of the matrix as our search direction, we can get our result immediately. That would be pretty neat. 4. Conjugate Gradient Conjugate gradient is a search method that tries to choose its search directions not completely orthogonally, but by recalling its previous search directions and scaling its newest directions with them, which generally leads to a nice speedup. This scaling makes the directions it searches conjugate; a more specific definition is that two vectors x and y are conjugate (with respect to a matrix A) if they are orthogonal in the inner product (x, y) = x T Ay = 0. To motivate this section, we are again going to be considering solving the system Ax = b, where A is symmetric and positive definite. Now, let us move on to trying to solve our minimization problem by picking orthogonal search directions. That is, we would like to pick directions s 0, s 1,..., s k, where s T i s j = 0 for i j. Recall that x k+1 = x k + αs k, which implies that x k+1 x exact = x k x exact + αs k e k+1 = e k + αs k (3) If we could go in in one direction of search until all error was orthogonal to the search direction, we could easily take large steps in the search space for the minimum of our function. The problem is that we do not know our error exactly, so we would like to eliminate all of our residual errors. To this end, we make what are known as A-orthogonal steps rather than strictly orthogonal directions (we will see why in a bit). To eliminate our residual errors, we have From this, we can see that 0 = r T k+1s k = (Ae k+1 ) T s k = A(e k + αs k ) T s k α = st k Ae k s T k As k = st k r k s T k As k This is A-orthogonal because we would like to constrain our search so that s T j As k = 0 for all j k. 7

8 Let us suppose that we would like to write our error as the scaled sum of a series of our search terms. That is, e 0 = j a js j (this sum takes into account all steps in the search). From this, we can derive some interesting properties of our error. s T k Ae 0 = s T k A j a j s j = j a j s T k As j = a k s T k As k + j k a j s k As j = a k s T k As k (4) where equation (4) follows because we are using A-orthogonal directions, so the summation goes to zero. This tells us that (again using A-orthogonal directions) a k = st k Ae 0 s T k As k = st k A(e 0 + α js j ) s T k As k Consider equation (3): e k+1 = e k + αs k. From this, we can see that Because of this, we have e k = e + α k s = e k + α k s k + α s = e 0 + α j s j a k = st k Ae k s T k As k = st k r k s T k As k = α k This allows us to consider the error at the k th step of our search. e k = e 0 + = j α j s j = j α j s j + α j s j a j s j + α j s j If we pre-multiply e k by one of our search directions transposed with A, in the case when i k 1, we have Similarly, for i k, we see that s T i Ae k = s T i r k = α j s T i As j + α j s T i As j j = α i s T i As i + α i s T i As i = 0 (5) s T i r k = α i s T i As i Note also from the second equation above that if we find a search direction for every dimension of the space we are trying to optimize, then our error at step k will go to zero. 8

9 We now have a lot of math that is really doing very little for us unless we can find our A- orthogonal directions to optimize. We can use simple Gram-Schmidt orthonormalization to remove the components of search directions already seen to choose a new search direction. That is, for some v not in the direction of s k, to get a our new vector s k which is A-orthogonal to all of the s j, we use the following: v T As j s k = v s T j As s j j For i k 1, we have s T k As i = v T v T As j As i s T j As s T j As i j = v T As i vt As i s T i As s T i As i = 0 i where the last line follows because our previous search vectors s j have all been A-orthogonalized. In the other case for i, that i k, we see that s T k r i = v T v T As j r i s T j As s T j r i j = v T r i (6) because, as we proved above in equation (5), s T j r i = 0 for j < i. We now have a relatively tedious way to calculate our next orthogonal search directions; armed with this, we can calculate the distance we should take steps in our search. Recalling that α k = st k r k s T k As k, we see that st k r k = v T r k implies that α k = vt r k s T k As k. Naturally, there is a simpler way to calculate the new search directions using the old ones than an expensive Gram-Schmidt orthogonalization step. Consider that r k+1 = b Ax k+1 = b A(x k + α k s k ) = b As k α k As k = r k α k As k so that α k As k = r k r k+1. This tells us that α k r T i As k = r T i r k r T i r k+1 and r T i r k+1 = r T i r k α k r T i As k Given that we are trying to find the k +1 search directions from the previous step, we set i = k +1. Now, we have that α k r T k+1 As k = r T k+1 r k r T k+1 r k+1. Because when k < i, v T k r i = s T k r i = 0 (see equations (6) and (5)), we know that if we simply set v k to be r k, then we have that r T k r i = 0. As such, since k < k + 1, we see that r T k+1as k = rt k+1 r k r T k+1 r k+1 α k = rt k+1 r k+1 α k Naturally, when i > k + 1, we have α k r T i As k = 0. Taking the above two paragraphs and combining them into the update rule for our search directions, (recalling that we set r k = v k ), we have the following update rule: s k+1 = r k+1 k r T k+1 As j s T j As j = r k+1 + rt k+1 r k+1 r T k r s k k s j 9

10 The sum goes to zero except for the j = k term, since α j r T k+1 As j = 0 except when j = k. Combining all of the above work for minimizing the function f(x) = 1 xt Ax b T x + c, we have the following set of update rules for the conjugate gradient method: s 0 = r 0 = b Ax 0 α k = rt k r k s T k AsT k x k+1 = x k + α k s k r k+1 = r k α k As k s k+1 = r k+1 + rt k+1 r k+1 r T k r s k k In the more general case, when we are trying to optimize an arbitrary function f : R n R, we modify the method above a little. First, recall that for the case of the quadratic function of with A, b, and c, that f(x) = A b T x = r. As such, for an arbitrary f, we can see that r k corresponds to f(x k ), giving us the more general conjugate gradient method. s 0 = f(x 0 ) α k = argmin f(x k s k t) t x k+1 = x k + α k s k 5 Perceptron Convergence s k+1 = f(x k+1 ) + f(x k+1) T f(x k+1 ) f(x k ) T s k f(x k ) Foregoing, temporarily, any introduction to the perceptron algorithm, let us denote a hyperplane by f(x) = β T 1 x + β 0 = 0 or simply by β T x ext = 0, where x ext = x, 1. If we have separability, that is, that for some β y (i) β T x (i) ext > 0 i we can scale β accordingly to give us normalization for the points closest to the hyperplane. I do not have a great justification for this, but it is relatively (though somewhat unjustifiably) easy to see that if the above condition holds then for some C > 0 and z (i) = x (i) ext/ x (i) ext, then y (i) βz (i) C. Given this, we can simply divide both sides of the equation by C, giving us a new β such that y (i) β T z (i) 1. In the perceptron algorithm, we have a current estimate for β, let us call that β k. The perceptron will identify a point z (i) that is misclassified and produce the update β k+1 = β k + z (i) y (i). With this, we can bound the difference of β k and β, which implies that the algorithm will converge to a separating hyperplane. Consider the following: β k+1 β = β k+1 + β β T β k+1 = β k + z (i) y (i) + β β T (β k + z (i) y (i) ) = y (i) β T k z (i) + β T k β k + z (i) + β β T β k y (i) β T z (i) y (i) β T k z (i) + β k β β T β k = (β k β ) T (β k β ) 1 + y (i) β T k z (i) < β k β 1 (7) 10

11 where the fourth line follows because y (i) β T z (i) 1, and the last line, equation (7), follows because β k misclassified z (i), making the y (i) β T k z (i) term negative. Noting this contraction, we see that the perceptron algorithm will converge to a separating hyperplane (in the case of separable data) in no more than β 0 β steps. Unfortunately, because the scaling can be very bad to give us the 1-margin separating hyperplane from above, this could be a huge number of steps. That, my friends, is why we use support vector machines. 6 Duality of SVMs In this section, we consider the Support Vector Machine and its dual. We will focus on the case of data that is not separable, so that we are trying to minimize a norm of the vector w plus penalty terms for un-separated subject to data points. That is, we are seeking argmin w subject to 1 wt w + c T ξ y (i) (w T x (i) + b) 1 ξ i, i = 1,..., m ξ 0 The Lagrangian for this is L(w, ξ, λ, β) = 1 wt w + c T ξ + Taking derivatives, we have that m λ i (1 ξ i y (i) (w T x (i) + b)) β T ξ w L = w + b L = m λ i ( y (i) x (i) ) m λ i y (i) ξ L = c λ β Setting each of the above to zero and solving, we have that λ i = c i β i, which, because β i > 0 for the optimal solution (because of duality, etc.), implies that 0 λ i c i in the dual. We also have that w = m λ iy (i) x (i) and the constraint that in the dual, m λ iy (i) = 0. Substituting, ( m ) T ( 1 m ) m m m λ i y (i) x (i) λ i y (i) x (i) + c T ξ + λ i λ T ξ β T ξ λ i y (i) λ j y (j) (x (j) ) T x (i) + b j=1 After some simplification and using the Lagrange constraint from the ξ, we have that our dual optimization problem becomes argmax λ s.t. 1 m m m λ i λ j y (i) y (j) (x (i) ) T x (j) + j=1 0 λ c m λ i y (i) = 0 λ i which can be optimized using pairwise coordinate ascent on the λ i s (as in the SMO algorithm). 11

12 7 Projection of a vector onto linear constraints Let us suppose that we would like to project a vector z onto the set {x Ax = b}. Thus, we would like to find the x that minimizes z x subject to Ax = b. First, we get our Lagrangian L(x, λ) = z T z x T z + x T x + λ T (Ax b) whose derivative we take with respect to x. This gives us x L(x, λ) = z + x + A T λ. Setting this to zero, we find that x = z A T λ/. Plugging this into the lagrangian above, we have ( L(λ) = z T z z 1 T ( ) T ( ) ( ( ) ) λ) AT z + z AT λ z AT λ + λ T A z AT λ b = z T z z T z + λ T Az + z T z + λt AA T λ 4 = λ T Az λt AA T λ λ T b 4 The derivative of the above with respect to λ is and setting this to 0, we have λ L(λ) = Az AA T λ/ b, λ = (AA T ) 1 (Az b). Finally, plugging this into x = z A T λ/, we have that λ T Az +λ T Az λt AA T λ x = z A T (AA T ) 1 Az + A T (AA T ) 1 b. λ T b Thus, the projection of z onto the null space of A is I A T (AA T ) 1 A, and the projection onto Ax = b given the projection into the nullspace is simply adding A T (AA T ) 1 b, which is a solution of Ax = b. 8 Minimization of a linear function subject to elliptical constraints In this section, we consider the following problem: where A S n ++. First, we form the Lagrangian: Taking the gradient with respect to x, we have minimize c T x subject to x T Ax 1 L(x, ν) = c T x + ν(x T Ax 1). x L(x, ν) = c + νax so that x = 1 ν A 1 c. We can solve assuming equality because unless the ellipse is unbounded or c = 0, the optimal value will be on the edge of the ellipsoid x T Ax 1. Plugging this in to the inequality constraint, we have that 1 4ν ct A 1 AA 1 c = 1 so ν = ± 1 A 1 c. 1

13 Plugging all this in, we find that the assignments to x and the minimum and maximum values are x 1 = ± A 1 c so c T x 1 = c T A 1 c = A 1 A 1 c A 1 c. c 9 KL Divergence for Gaussian distributions First, recall that the KL divergence between two distributions P and Q is defined as [ D KL (P Q) = E P log P ]. Q Also, the density function for a multivariate Gaussian (normal) distribution with mean µ and covariance matrix Σ is ( 1 p(x) = exp 1 ) (π) n/ det(σ) 1/ (x µ)t Σ 1 (x µ). Now, consider two multivariate Gaussians in R n, P 1 and P. We have D(P 1 P ) = E P1 [log P 1 log P ] = 1 E P 1 [ log det Σ1 (x µ 1 ) T Σ 1 1 (x µ 1) + log det Σ + (x µ ) T Σ 1 (x µ ) ] = 1 log det Σ det Σ E P 1 [ (x µ1 ) T Σ 1 1 (x µ 1) + (x µ ) T Σ 1 (x µ ) ] = 1 log det Σ det Σ E P 1 [ tr(σ 1 1 (x µ 1)(x µ 1 ) T ) + tr(σ 1 (x µ )(x µ ) T ) ] = 1 log det Σ det Σ E P 1 [ tr(σ 1 1 Σ 1) + tr(σ 1 (xxt xµ T + µ µ T )) ] = 1 log det Σ 1 det Σ 1 n + 1 tr(σ 1 (Σ 1 + µ 1 µ T 1 µ µ T 1 + µ µ T )) = 1 ( log det Σ ) n + tr(σ 1 det Σ Σ 1) + tr(µ T 1 Σ 1 µ 1 µ T 1 Σ 1 µ + µ T Σ 1 µ ) 1 = 1 ( log det Σ ) n + tr(σ 1 det Σ Σ 1) + (µ µ 1 ) T Σ 1 (µ µ 1 ) 1 13

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013

Notes on Orthogonal and Symmetric Matrices MENU, Winter 2013 Notes on Orthogonal and Symmetric Matrices MENU, Winter 201 These notes summarize the main properties and uses of orthogonal and symmetric matrices. We covered quite a bit of material regarding these topics,

More information

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA This handout presents the second derivative test for a local extrema of a Lagrange multiplier problem. The Section 1 presents a geometric motivation for the

More information

1 2 3 1 1 2 x = + x 2 + x 4 1 0 1

1 2 3 1 1 2 x = + x 2 + x 4 1 0 1 (d) If the vector b is the sum of the four columns of A, write down the complete solution to Ax = b. 1 2 3 1 1 2 x = + x 2 + x 4 1 0 0 1 0 1 2. (11 points) This problem finds the curve y = C + D 2 t which

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

Section 13.5 Equations of Lines and Planes

Section 13.5 Equations of Lines and Planes Section 13.5 Equations of Lines and Planes Generalizing Linear Equations One of the main aspects of single variable calculus was approximating graphs of functions by lines - specifically, tangent lines.

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

[1] Diagonal factorization

[1] Diagonal factorization 8.03 LA.6: Diagonalization and Orthogonal Matrices [ Diagonal factorization [2 Solving systems of first order differential equations [3 Symmetric and Orthonormal Matrices [ Diagonal factorization Recall:

More information

Practice with Proofs

Practice with Proofs Practice with Proofs October 6, 2014 Recall the following Definition 0.1. A function f is increasing if for every x, y in the domain of f, x < y = f(x) < f(y) 1. Prove that h(x) = x 3 is increasing, using

More information

Polynomial Invariants

Polynomial Invariants Polynomial Invariants Dylan Wilson October 9, 2014 (1) Today we will be interested in the following Question 1.1. What are all the possible polynomials in two variables f(x, y) such that f(x, y) = f(y,

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

DERIVATIVES AS MATRICES; CHAIN RULE

DERIVATIVES AS MATRICES; CHAIN RULE DERIVATIVES AS MATRICES; CHAIN RULE 1. Derivatives of Real-valued Functions Let s first consider functions f : R 2 R. Recall that if the partial derivatives of f exist at the point (x 0, y 0 ), then we

More information

10. Proximal point method

10. Proximal point method L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing

More information

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Linear Algebra Notes for Marsden and Tromba Vector Calculus Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of

More information

Math 215 HW #6 Solutions

Math 215 HW #6 Solutions Math 5 HW #6 Solutions Problem 34 Show that x y is orthogonal to x + y if and only if x = y Proof First, suppose x y is orthogonal to x + y Then since x, y = y, x In other words, = x y, x + y = (x y) T

More information

BANACH AND HILBERT SPACE REVIEW

BANACH AND HILBERT SPACE REVIEW BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but

More information

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8 Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e

More information

Chapter 6. Cuboids. and. vol(conv(p ))

Chapter 6. Cuboids. and. vol(conv(p )) Chapter 6 Cuboids We have already seen that we can efficiently find the bounding box Q(P ) and an arbitrarily good approximation to the smallest enclosing ball B(P ) of a set P R d. Unfortunately, both

More information

Nonlinear Programming Methods.S2 Quadratic Programming

Nonlinear Programming Methods.S2 Quadratic Programming Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective

More information

Vector and Matrix Norms

Vector and Matrix Norms Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

More information

Linear algebra and the geometry of quadratic equations. Similarity transformations and orthogonal matrices

Linear algebra and the geometry of quadratic equations. Similarity transformations and orthogonal matrices MATH 30 Differential Equations Spring 006 Linear algebra and the geometry of quadratic equations Similarity transformations and orthogonal matrices First, some things to recall from linear algebra Two

More information

17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function

17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function 17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function, : V V R, which is symmetric, that is u, v = v, u. bilinear, that is linear (in both factors):

More information

1 if 1 x 0 1 if 0 x 1

1 if 1 x 0 1 if 0 x 1 Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or

More information

Section 1.1. Introduction to R n

Section 1.1. Introduction to R n The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to

More information

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

Solutions for Review Problems

Solutions for Review Problems olutions for Review Problems 1. Let be the triangle with vertices A (,, ), B (4,, 1) and C (,, 1). (a) Find the cosine of the angle BAC at vertex A. (b) Find the area of the triangle ABC. (c) Find a vector

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

Orthogonal Projections

Orthogonal Projections Orthogonal Projections and Reflections (with exercises) by D. Klain Version.. Corrections and comments are welcome! Orthogonal Projections Let X,..., X k be a family of linearly independent (column) vectors

More information

Solutions to Math 51 First Exam January 29, 2015

Solutions to Math 51 First Exam January 29, 2015 Solutions to Math 5 First Exam January 29, 25. ( points) (a) Complete the following sentence: A set of vectors {v,..., v k } is defined to be linearly dependent if (2 points) there exist c,... c k R, not

More information

Constrained optimization.

Constrained optimization. ams/econ 11b supplementary notes ucsc Constrained optimization. c 2010, Yonatan Katznelson 1. Constraints In many of the optimization problems that arise in economics, there are restrictions on the values

More information

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all. 1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.

More information

LINEAR ALGEBRA W W L CHEN

LINEAR ALGEBRA W W L CHEN LINEAR ALGEBRA W W L CHEN c W W L Chen, 1997, 2008 This chapter is available free to all individuals, on understanding that it is not to be used for financial gain, and may be downloaded and/or photocopied,

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

(a) We have x = 3 + 2t, y = 2 t, z = 6 so solving for t we get the symmetric equations. x 3 2. = 2 y, z = 6. t 2 2t + 1 = 0,

(a) We have x = 3 + 2t, y = 2 t, z = 6 so solving for t we get the symmetric equations. x 3 2. = 2 y, z = 6. t 2 2t + 1 = 0, Name: Solutions to Practice Final. Consider the line r(t) = 3 + t, t, 6. (a) Find symmetric equations for this line. (b) Find the point where the first line r(t) intersects the surface z = x + y. (a) We

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Inner product. Definition of inner product

Inner product. Definition of inner product Math 20F Linear Algebra Lecture 25 1 Inner product Review: Definition of inner product. Slide 1 Norm and distance. Orthogonal vectors. Orthogonal complement. Orthogonal basis. Definition of inner product

More information

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they

More information

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.

Au = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively. Chapter 7 Eigenvalues and Eigenvectors In this last chapter of our exploration of Linear Algebra we will revisit eigenvalues and eigenvectors of matrices, concepts that were already introduced in Geometry

More information

Factorization Theorems

Factorization Theorems Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

More information

Math 241, Exam 1 Information.

Math 241, Exam 1 Information. Math 241, Exam 1 Information. 9/24/12, LC 310, 11:15-12:05. Exam 1 will be based on: Sections 12.1-12.5, 14.1-14.3. The corresponding assigned homework problems (see http://www.math.sc.edu/ boylan/sccourses/241fa12/241.html)

More information

Manifold Learning Examples PCA, LLE and ISOMAP

Manifold Learning Examples PCA, LLE and ISOMAP Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition

More information

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725 Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T

More information

Lecture 3: Finding integer solutions to systems of linear equations

Lecture 3: Finding integer solutions to systems of linear equations Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

1 Review of Least Squares Solutions to Overdetermined Systems

1 Review of Least Squares Solutions to Overdetermined Systems cs4: introduction to numerical analysis /9/0 Lecture 7: Rectangular Systems and Numerical Integration Instructor: Professor Amos Ron Scribes: Mark Cowlishaw, Nathanael Fillmore Review of Least Squares

More information

CS3220 Lecture Notes: QR factorization and orthogonal transformations

CS3220 Lecture Notes: QR factorization and orthogonal transformations CS3220 Lecture Notes: QR factorization and orthogonal transformations Steve Marschner Cornell University 11 March 2009 In this lecture I ll talk about orthogonal matrices and their properties, discuss

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Notes on Symmetric Matrices

Notes on Symmetric Matrices CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.

More information

7 Gaussian Elimination and LU Factorization

7 Gaussian Elimination and LU Factorization 7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method

More information

Duality of linear conic problems

Duality of linear conic problems Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least

More information

Support Vector Machine (SVM)

Support Vector Machine (SVM) Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Inner Product Spaces and Orthogonality

Inner Product Spaces and Orthogonality Inner Product Spaces and Orthogonality week 3-4 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,

More information

Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Numerical Analysis Lecture Notes Peter J. Olver 6. Eigenvalues and Singular Values In this section, we collect together the basic facts about eigenvalues and eigenvectors. From a geometrical viewpoint,

More information

Scalar Valued Functions of Several Variables; the Gradient Vector

Scalar Valued Functions of Several Variables; the Gradient Vector Scalar Valued Functions of Several Variables; the Gradient Vector Scalar Valued Functions vector valued function of n variables: Let us consider a scalar (i.e., numerical, rather than y = φ(x = φ(x 1,

More information

Date: April 12, 2001. Contents

Date: April 12, 2001. Contents 2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

More information

Basic numerical skills: EQUATIONS AND HOW TO SOLVE THEM. x + 5 = 7 2 + 5-2 = 7-2 5 + (2-2) = 7-2 5 = 5. x + 5-5 = 7-5. x + 0 = 20.

Basic numerical skills: EQUATIONS AND HOW TO SOLVE THEM. x + 5 = 7 2 + 5-2 = 7-2 5 + (2-2) = 7-2 5 = 5. x + 5-5 = 7-5. x + 0 = 20. Basic numerical skills: EQUATIONS AND HOW TO SOLVE THEM 1. Introduction (really easy) An equation represents the equivalence between two quantities. The two sides of the equation are in balance, and solving

More information

Linear Algebra Notes

Linear Algebra Notes Linear Algebra Notes Chapter 19 KERNEL AND IMAGE OF A MATRIX Take an n m matrix a 11 a 12 a 1m a 21 a 22 a 2m a n1 a n2 a nm and think of it as a function A : R m R n The kernel of A is defined as Note

More information

SOLUTIONS. f x = 6x 2 6xy 24x, f y = 3x 2 6y. To find the critical points, we solve

SOLUTIONS. f x = 6x 2 6xy 24x, f y = 3x 2 6y. To find the critical points, we solve SOLUTIONS Problem. Find the critical points of the function f(x, y = 2x 3 3x 2 y 2x 2 3y 2 and determine their type i.e. local min/local max/saddle point. Are there any global min/max? Partial derivatives

More information

TOPIC 4: DERIVATIVES

TOPIC 4: DERIVATIVES TOPIC 4: DERIVATIVES 1. The derivative of a function. Differentiation rules 1.1. The slope of a curve. The slope of a curve at a point P is a measure of the steepness of the curve. If Q is a point on the

More information

1 The Brownian bridge construction

1 The Brownian bridge construction The Brownian bridge construction The Brownian bridge construction is a way to build a Brownian motion path by successively adding finer scale detail. This construction leads to a relatively easy proof

More information

THE BANACH CONTRACTION PRINCIPLE. Contents

THE BANACH CONTRACTION PRINCIPLE. Contents THE BANACH CONTRACTION PRINCIPLE ALEX PONIECKI Abstract. This paper will study contractions of metric spaces. To do this, we will mainly use tools from topology. We will give some examples of contractions,

More information

Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain

Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain 1. Orthogonal matrices and orthonormal sets An n n real-valued matrix A is said to be an orthogonal

More information

3. Reaction Diffusion Equations Consider the following ODE model for population growth

3. Reaction Diffusion Equations Consider the following ODE model for population growth 3. Reaction Diffusion Equations Consider the following ODE model for population growth u t a u t u t, u 0 u 0 where u t denotes the population size at time t, and a u plays the role of the population dependent

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system 1. Systems of linear equations We are interested in the solutions to systems of linear equations. A linear equation is of the form 3x 5y + 2z + w = 3. The key thing is that we don t multiply the variables

More information

Lecture 6: Logistic Regression

Lecture 6: Logistic Regression Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,

More information

Chapter 6. Orthogonality

Chapter 6. Orthogonality 6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be

More information

Similarity and Diagonalization. Similar Matrices

Similarity and Diagonalization. Similar Matrices MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

More information

The last three chapters introduced three major proof techniques: direct,

The last three chapters introduced three major proof techniques: direct, CHAPTER 7 Proving Non-Conditional Statements The last three chapters introduced three major proof techniques: direct, contrapositive and contradiction. These three techniques are used to prove statements

More information

correct-choice plot f(x) and draw an approximate tangent line at x = a and use geometry to estimate its slope comment The choices were:

correct-choice plot f(x) and draw an approximate tangent line at x = a and use geometry to estimate its slope comment The choices were: Topic 1 2.1 mode MultipleSelection text How can we approximate the slope of the tangent line to f(x) at a point x = a? This is a Multiple selection question, so you need to check all of the answers that

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka kosecka@cs.gmu.edu http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length

More information

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing

More information

To give it a definition, an implicit function of x and y is simply any relationship that takes the form:

To give it a definition, an implicit function of x and y is simply any relationship that takes the form: 2 Implicit function theorems and applications 21 Implicit functions The implicit function theorem is one of the most useful single tools you ll meet this year After a while, it will be second nature to

More information

α = u v. In other words, Orthogonal Projection

α = u v. In other words, Orthogonal Projection Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v

More information

Solution to Homework 2

Solution to Homework 2 Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if

More information

Lecture 7: Finding Lyapunov Functions 1

Lecture 7: Finding Lyapunov Functions 1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.243j (Fall 2003): DYNAMICS OF NONLINEAR SYSTEMS by A. Megretski Lecture 7: Finding Lyapunov Functions 1

More information

Recall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n.

Recall the basic property of the transpose (for any A): v A t Aw = v w, v, w R n. ORTHOGONAL MATRICES Informally, an orthogonal n n matrix is the n-dimensional analogue of the rotation matrices R θ in R 2. When does a linear transformation of R 3 (or R n ) deserve to be called a rotation?

More information

Multi-variable Calculus and Optimization

Multi-variable Calculus and Optimization Multi-variable Calculus and Optimization Dudley Cooke Trinity College Dublin Dudley Cooke (Trinity College Dublin) Multi-variable Calculus and Optimization 1 / 51 EC2040 Topic 3 - Multi-variable Calculus

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Applications to Data Smoothing and Image Processing I

Applications to Data Smoothing and Image Processing I Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is

More information

Equations Involving Lines and Planes Standard equations for lines in space

Equations Involving Lines and Planes Standard equations for lines in space Equations Involving Lines and Planes In this section we will collect various important formulas regarding equations of lines and planes in three dimensional space Reminder regarding notation: any quantity

More information

5. Orthogonal matrices

5. Orthogonal matrices L Vandenberghe EE133A (Spring 2016) 5 Orthogonal matrices matrices with orthonormal columns orthogonal matrices tall matrices with orthonormal columns complex matrices with orthonormal columns 5-1 Orthonormal

More information

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d). 1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

More information

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

More information

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

Elasticity Theory Basics

Elasticity Theory Basics G22.3033-002: Topics in Computer Graphics: Lecture #7 Geometric Modeling New York University Elasticity Theory Basics Lecture #7: 20 October 2003 Lecturer: Denis Zorin Scribe: Adrian Secord, Yotam Gingold

More information

Linear Programming. March 14, 2014

Linear Programming. March 14, 2014 Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1

More information

A QUICK GUIDE TO THE FORMULAS OF MULTIVARIABLE CALCULUS

A QUICK GUIDE TO THE FORMULAS OF MULTIVARIABLE CALCULUS A QUIK GUIDE TO THE FOMULAS OF MULTIVAIABLE ALULUS ontents 1. Analytic Geometry 2 1.1. Definition of a Vector 2 1.2. Scalar Product 2 1.3. Properties of the Scalar Product 2 1.4. Length and Unit Vectors

More information

Math 120 Final Exam Practice Problems, Form: A

Math 120 Final Exam Practice Problems, Form: A Math 120 Final Exam Practice Problems, Form: A Name: While every attempt was made to be complete in the types of problems given below, we make no guarantees about the completeness of the problems. Specifically,

More information

Section 6.1 - Inner Products and Norms

Section 6.1 - Inner Products and Norms Section 6.1 - Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,

More information

Lecture 2: Homogeneous Coordinates, Lines and Conics

Lecture 2: Homogeneous Coordinates, Lines and Conics Lecture 2: Homogeneous Coordinates, Lines and Conics 1 Homogeneous Coordinates In Lecture 1 we derived the camera equations λx = P X, (1) where x = (x 1, x 2, 1), X = (X 1, X 2, X 3, 1) and P is a 3 4

More information

Mechanics 1: Conservation of Energy and Momentum

Mechanics 1: Conservation of Energy and Momentum Mechanics : Conservation of Energy and Momentum If a certain quantity associated with a system does not change in time. We say that it is conserved, and the system possesses a conservation law. Conservation

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation

More information

2.2 Derivative as a Function

2.2 Derivative as a Function 2.2 Derivative as a Function Recall that we defined the derivative as f (a) = lim h 0 f(a + h) f(a) h But since a is really just an arbitrary number that represents an x-value, why don t we just use x

More information

ISOMETRIES OF R n KEITH CONRAD

ISOMETRIES OF R n KEITH CONRAD ISOMETRIES OF R n KEITH CONRAD 1. Introduction An isometry of R n is a function h: R n R n that preserves the distance between vectors: h(v) h(w) = v w for all v and w in R n, where (x 1,..., x n ) = x

More information

Linear Algebraic Equations, SVD, and the Pseudo-Inverse

Linear Algebraic Equations, SVD, and the Pseudo-Inverse Linear Algebraic Equations, SVD, and the Pseudo-Inverse Philip N. Sabes October, 21 1 A Little Background 1.1 Singular values and matrix inversion For non-smmetric matrices, the eigenvalues and singular

More information

LINES AND PLANES CHRIS JOHNSON

LINES AND PLANES CHRIS JOHNSON LINES AND PLANES CHRIS JOHNSON Abstract. In this lecture we derive the equations for lines and planes living in 3-space, as well as define the angle between two non-parallel planes, and determine the distance

More information

Solutions to Homework 5

Solutions to Homework 5 Solutions to Homework 5 1. Let z = f(x, y) be a twice continously differentiable function of x and y. Let x = r cos θ and y = r sin θ be the equations which transform polar coordinates into rectangular

More information