A Hybrid Algorithm for Convex Semidefinite Optimization
|
|
|
- Dominic Brendan McLaughlin
- 9 years ago
- Views:
Transcription
1 Sören Laue Friedrich-Schiller-University Jena, Germany Abstract We present a hybrid algorithm for optimizing a convex, smooth function over the cone of positive semidefinite matrices. Our algorithm converges to the global optimal solution and can be used to solve general largescale semidefinite programs and hence can be readily applied to a variety of machine learning problems. We show experimental results on three machine learning problems. Our approach outperforms state-of-the-art algorithms. 1. Introduction We consider the following unconstrained semidefinite optimization problem: min f(x) s.t. X, (1) where f(x) : R n n R is a convex and differentiable function over the cone of positive semidefinite matrices. Many machine learning problems can be cast as a semidefinite optimization problem. Prominent examples include sparse PCA (d Aspremont et al., 27), distance metric learning (Xing et al., 22), nonlinear dimensionality reduction (Weinberger et al., 26), multiple kernel learning (Lanckriet et al., 24), multitask learning (Obozinski et al., 21), and matrix completion (Srebro et al., 24). We provide an algorithm that solves general large-scale unconstrained semidefinite optimization problems efficiently. The idea to our algorithm is a hybrid approach: we combine the algorithm of Hazan (28) with a standard quasi-newton algorithm. The algorithm achieves convergence to the global optimum with very good running time. It can be readily used for a Appearing in Proceedings of the 29 th International Conference on Machine Learning, Edinburgh, Scotland, UK, 212. Copyright 212 by the author(s)/owner(s). variety of machine learning problems and we demonstrate its efficiency on three different tasks: matrix completion, metric learning, and sparse PCA. Another advantage of the algorithm is its simplicity, as it can be implemented in less than 3 lines of Matlab code Related Work A constrained version of Problem 1 is called a semidefinite program (SDP) if function f as well as the constraints are linear. Semidefinite programs have gained a lot of attention in recent years, since many NP-hard problems can be relaxed into SDPs and many machine learning problems can be modeled as SDPs. The most widely known implementations of SDP solvers are interior point methods. They provide highaccuracy solutions in polynomial time. However, since the running time is a low-order polynomial in the dimension n they do not scale well to medium and large problems that often occur in machine learning. On the other hand, the high accuracy of their solutions is typically not needed as the input data is often noisy. Among other methods, proximal methods have been employed to solve SDPs in order to circumvent the large running time of interior point methods. They achieve better running times at the expense of less accurate solutions. Examples include (Nesterov, 27; Nemirovski, 24) and (Arora et al., 25) where the multiplicative weights update rule is employed. The algorithm of (Arora et al., 25) has been randomized by (Garber & Hazan, 211) based on the same idea as in (Grigoriadis & Khachiyan, 1995) to achieve sublinear running time. Another randomized algorithm has appeared in (Kleiner et al., 21). Furthermore, alternating direction methods have been proposed to solve SDPs (Wen et al., 21). Another line of algorithms for solving SDPs are Frank- Wolfe type algorithms such as (Hazan, 28). This approach is also known as sparse greedy approximation and these algorithms have the advantage that they produce sparse solutions (Clarkson, 28) which, for SDPs, corresponds to low-rank solutions. Low-rank
2 solutions are very appealing since they can drastically reduce the computational effort. Instead of storing a low-rank positive semidefinite matrix X R n n one just stores a matrix V R n k where X = V V T, with k being the rank of X. Matrix-vector multiplications, for instance, can be done in O(nk) instead of O(n 2 ). There have also been nonlinear approaches to linear SDPs, however, without general convergence guarantees (Burer & Monteiro, 23). In some special cases (matrix completion problems where it is assumed that the data is indeed generated by a low rank matrix and the restricted isometry property holds) convergence guarantees have been shown. In general, the problem of solving an SDP with a low-rank constraint is NP-hard (Goemans & Williamson, 1995). Organization of the Paper Section 2 describes our algorithm while in Section 3 we provide experimental results on three different machine learning problems. We compare our algorithm against a standard interior point method and against algorithms that are specifically designed for each of the individual problems. Section 4 provides theoretical guarantees for the running time and for the convergence to the global optimal solution. 2. The Hybrid Algorithm Our algorithm is summarized in pseudo-code in Algorithm 1. Algorithm 1 Hybrid Algorithm Input: Smooth, convex function f : R n n R Output: Approximate solution of Problem 1 Initialize X =, V = repeat Increase rank by 1 using Hazan update: Compute v i = ApproxEV( f(v i Vi T ), ε). Solve min α,β f(α V i Vi T + β v i vi T ) s. t. α, β. Set V i+1 = [ α V i, β v i ]. Run nonlinear update: Improve V i+1 by finding a local minimum of f(v V T ) wrt. V starting with V i+1. until Approximation guarantee has been reached The notation [V i, v i ] used in Algorithm 1 stands for the horizontal concatenation of matrix V i and column vector v i. The function ApproxEV returns an approximate eigenvector to the largest eigenvalue: given a square matrix M it returns a vector v i = ApproxEV(M, ε) of unit length that satisfies v T i Mv i λ max (M) ε, where λ max (M) denotes the largest eigenvector of matrix M. Our algorithm runs in iterations. Each iteration consists of two steps: a rank-1 update and a subsequent nonlinear improvement of the current solution. The rank-1 update step follows a Frank-Wolfe type approach. A linear approximation to function f at the current iterate X i is minimized over the cone of semidefinite matrices. The minimum is attained at v i vi T where v i = λ max ( f(x i )) is the vector to the largest eigenvalue of f(x i ). Then the next iterate X i+1 is a linear combination of the current iterate X i = V i Vi T and v i vi T such that it minimizes f. In the second step, the nonlinear update step, the current solution X i+1 = V i+1 Vi+1 T is further improved by minimizing function f(v V T ) with respect to V. Note that this function is no longer convex with respect to V. Hence, we can only expect to find a local minimum. Our analysis however shows, that this is sufficient to still converge to the global optimal solution of Problem 1. In fact, it is even not necessary to find a local minimum, any improvement will work. In Section 4 we will prove that after at most O( 1 ε ) many iterations Algorithm 1 will return a solution that is ε-close to the global optimal solution. 3. Applications and Experiments We have implemented our hybrid algorithm in Matlab exactly as described in Algorithm 1. For the nonlinear update we use minfunc (Schmidt) which implements the limited memory BFGS algorithm. The twovariable optimization problem in the rank-1 update is also solved using minfunc. We use the default settings of minfunc. The approximate eigenvector computation is done using the Matlab function eigs. We ran all experiments in single-thread mode on a 2.5GHz CPU Matrix Completion In this section we consider the matrix completion problem used for collaborative filtering. Given a matrix Y where only a few entries have been observed the goal is to complete this matrix by finding a low-complexity matrix X which approximates the given entries of Y as good as possible. Low complexity can be achieved for instance by a low rank or by small trace norm. Also different error norms can be considered depending on the specific application. For instance, the l1-norm in combination with the trace-norm regularization leads
3 6 48 Geco 5 4 Geco 1 8 Geco Rank Rank Rank Figure 1. Matrix completion problem: running time with respect to the rank for the three MovieLens datasets: 1K, 1M, and 1M. Table 1. Matrix completion problem: test error (RMSE) on the three MovieLens datasets. 1K 1M 1M RMSE Sec. Rank RMSE Sec. Rank RMSE Sec. Rank Geco Geco to the robust PCA approach for matrix completion with low rank (Candès et al., 211). Here, we use the l2-norm and the rank constraint as a measure of complexity. Hence, the matrix completion problem becomes the following optimization problem: min (i,j) Ω (X ij Y ij ) 2 s.t. rank(x) k, (2) where Ω is the set of all given entries of Y. Note that Problem (2) is NP-hard (Gillis & Glineur, 211). However, we can still attempt to find a good solution to it by using our hybrid algorithm. Problem (2) can be transformed into the following equivalent semidefinite optimization problem: where min (i,j) ˆΩ ( ˆX ij Ŷij) 2 s.t. ˆX = ( V X X T W rank( ˆX) k ˆX, ) and Ŷ = ( Y Y T ), (3) and ˆX is a positive semidefinite matrix. V and W are suitable symmetric matrices. Hence, the matrix completion Problem (2) for an input matrix Y R m n can be cast into a semidefinite optimization problem over matrices X R (m+n) (m+n). We compare our algorithm to a state-of-the-art solver GECO (Shalev-Shwartz et al., 211) which was specifically designed for solving large-scale matrix minimization problems with a low-rank constraint. We follow the experimental setting of (Shalev-Shwartz et al., 211). We use three standard matrix completion datasets: MovieLens1k, MovieLens1M, and Movie- Lens1M. The dimensions of the three datasets are , , and respectively and they contain 1 5, 1 6, and 1 7 movie ratings from 1 to 5. The task is to predict a movie rating for user i and movie j. We used the datasets without any normalization 1 and split them randomly such that for each user 8% of the ratings went into training data and 2% into test data. Our algorithm minimizes the training error much faster than GECO and at the same time also needs a much smaller rank. As a result we also achieve an optimal test error much faster and with a smaller rank than GECO. Table 1 reports the test root-mean-square error (RMSE) as well as the rank where it was achieved and the running times. Both rows for GECO in Table 1 reflect the same runs. The first row shows the statistics where the test error reaches the minimum. However, since GECO slows down a lot with the rank we also added intermediate results when the test error is approaching the minimum. As it can be observed our algorithm achieves the same or better test error by requiring only a fraction of the time needed by GECO. 1 We noticed that normalization had no impact on the results.
4 Both algorithms need quasi-linear time in the number of ratings and hence can be used to solve large scale matrix factorization problems. Note however, that for our algorithm the runtime per iteration scales linearly with the rank k whereas GECO needs O(k 6 ). This behavior slows down GECO considerably, which can be seen in Figure Metric Learning The second problem we approach is the metric learning problem. We are given a labeled dataset X = (x i, y i ) i. Let S be a set containing all pairs of indices (i, j) whose data points x i and x j are similar to each other, i.e. its labels y i and y j are equal; let the set S contain all indices of data points that are dissimilar to each other. For a given semidefinite matrix A, the Mahalanobis distance between x i and x j is defined as d A (i, j) = (x i x j ) T A(x i x j ). The metric learning problem is that of finding a positive semidefinite matrix A, such that under the induced Mahalanobis distance, points that are similar are close to each other and points that are dissimilar are far apart. This problem can be cast as a semidefinite optimization problem(xing et al., 22): min (i,j) S d A(i, j) 2 s.t. (i,j) S d A(i, j) 1 A. (4) Note that the 1 in the inequality constraint in Problem (4) can be changed to any arbitrary positive constant. This constraint is just to ensure that not all points are mapped onto the same point. Problem (4) does not fit into our framework. However, we can transform it into the following equivalent unconstrained semidefinite problem: min (i,j) S d A(i, j) 2 λ (i,j) S d A(i, j) s. t. A. (5) Problem (5) is just the Lagrangian of Problem (4). Since the 1 in the inequality constrained was chosen arbitrary we can also choose any positive constant for λ. In our experiments we set it to 1. We follow the experimental setting of (Kleiner et al., 21). We compared our approach against an interior point method implemented in SeDuMi (Sturm, 1999) (via CVX (Grant & Boyd, 211)) and the algorithm of (Xing et al., 22) which is a projected gradient approach and was specifically designed to solve the above SDP. We could not directly compare our algorithm to that of (Kleiner et al., 21) as the code was not available. However, the authors show that it performs similarly to (Xing et al., 22). As a measure of quality for a given solution A we define: Q(A) = 1 ξ 1[d A (i, j) < d A (i, l))], i j:(i,j) S l:(i,l) S where 1[.] is the indicator function and ξ = i j:(i,j) S l:(i,l) S 1 is a normalization factor. In essence Q captures how many points with the same label are mapped closer to each other than points with different labels. We initially apply metric learning to the UCI ionosphere dataset which contains 351 labeled data points in dimension 34. The results of this experiment are shown in Figure 2. As it can be seen, our hybrid algorithm achieves the optimal value almost instantly. The projected gradient descent algorithm (PG) needs about 2 times as long to achieve a solution of comparable quality. Since the interior point method (IP) scales very badly with the number of data points, we only ran it on a sub-sample of size 4*34=136. On this dataset, our method achieves the same function value as IP: 5.47e-5, while requiring only.28 seconds as opposed to 1513 seconds that IP needs. PG achieves a function value of 5.5e-5 in 88 seconds. Function Value x PG Quality PG Figure 2. Metric learning problem: UCI ionosphere data. Following (Kleiner et al., 21), we ran a second set of experiments on synthetic data in order to measure the dependence on the dimension d. We sampled points from R d as follows: We define two sets of cluster centers C 1 = {( 1, 1), ( 1, 1)} and C 2 = {(1, 1), (1, 1)} and apply a random rotation to both sets. We then sample each data point from a uniform Gaussian distribution N (, I d ). The first two coordinates of each data point are replaced by one of the cluster centers and the label of this data point is set accordingly to either 1 or 2. Finally, a small perturbation drawn from N (,.25I 2 ) is added to the first two coordinates. The results are depicted in Table 2, which shows the running times for the various algorithms until a quality measure of Q >.99 has been reached. Our algorithm achieves the same optimal function values as the interior point method, while requiring less
5 Table 2. Metric learning problem: synthetic data. Time to Q >.99 for different dimensions and function values at those times. d Alg. f value Sec. to Q>.99 5 PG 3.94e IP 3.57e e PG 5.62e IP 1.21e e to Quality > PG Dimension Figure 3. Metric learning problem: Synthetic data. time than the PG method. For larger dimensions we plot the results in Figure 3. As it can be observed, our algorithm is considerably faster than PG on these larger dimensions. We omit the IP method here, as this scales badly with increased dimension Sparse PCA As a third problem we consider the sparse principal component analysis problem (sparse PCA). For a given covariance matrix A R n n, sparse PCA tries to find a sparse vector x that maximizes x T Ax, i.e. a sparse principal component of A. This problem can be relaxed into the following SDP (d Aspremont et al., 27): min ρ (i,j) X ij A X s. t. Tr(X) = 1 (6) X, where A X denotes Tr(A T X). In a subsequent rounding step the largest eigenvector of the solution to Problem (6) is returned as the solution vector x. The parameter ρ controls the tradeoff between the sparsity of x and the explained variance x T Ax. Problem (6) is not in form (1). However, one can easily transform it into an unconstrained semidefinite problem by defining the functions g(x) = X Tr(X) and f(x) = ρ (i,j) X ij A X. Hence, Problem (6) is equivalent to min f(g(x)) (7) X. Note that f(g(x)) is again a convex function over the set of semidefinite matrices without the zero matrix. However, f(g(x)) is not smooth. Smoothness of f(g(x)) can be achieved either by implicitly smoothing it, e.g. using Nesterov s smoothing technique (Nesterov, 25) or by explicitly smoothing it and replacing the absolute function. with the scaled Huber-loss H M. The Huber-loss is defined as: { x 2 if x M H M (x) = 2M x M 2 if x > M By appropriate scaling one can achieve an arbitrary small difference between x and H M (x). We obtain a smooth, convex function f HM by replacing the absolute function with the Huber-loss in function f. In our experiments we set M = 1 6 such that functions f HM and f differ only marginally from each other. We again follow the experimental setting of (Kleiner et al., 21) and we compare to an interior point method and to a state-of-the-art algorithm, the DSPCA algorithm (d Aspremont et al., 27) which is specifically designed to solve Problem (6). We used the colon cancer data set which contains 2 microarray readings from 62 subjects. We randomly sampled readings in order to vary the dimension d. As standard with this task, we normalized the data to mean and standard deviation 1. We set ρ =.2 in Problem (6) to obtain sparse solutions for all d. Table 3. Sparse PCA problem: colon cancer data. Function value at convergence, time to convergence, sparsity and variance of the largest eigenvector x of the solution X. d Alg. f value Sec. Spars. Var. 5 DSPCA IP DSPCA IP DSPCA DSPCA DSPCA DSPCA Table 3 reports the running time, the function value at convergence, the sparsity of the solution and the
6 captured variance for these data sets. As mentioned above, we ran our algorithm on the function f HM, however we report the function value f(x) for the original formulation (6). The solutions of our algorithm are basically identical to those of the interior point method. However, it needs only a fraction of the time spent by the interior point method. The DSPCA algorithm provides accurate solutions within short time even with increasing dimension, however our algorithm is still considerably faster, especially for large dimensions. 4. Analysis 4.1. The Duality Gap In this section we provide a duality gap for Problem (1) and analyze the running time of Algorithm 1. Let Problem (1) have finite optimal solution denoted by f obtained at X. Let t be an upper bound on the trace norm Tr(X ). Such a trace bound always exists if f >. Then the optimization Problem (1) is equivalent to: min f(x) s. t. Tr(X) t X (8) In order to simplify some technicalities in the proof we change Problem (8) into: min ˆf( ˆX) s. t. Tr( ˆX) = t ˆX. (9) Problems (8) and Problem (9) are equivalent if we define ˆf( ˆX) := f(x), where ( ) X ˆX = t and t = t Tr(X). Note that ˆX is positive semidefinite whenever X is positive semidefinite. This transformation is only done here for simplifying the analysis of Algorithm 1. It does not alter Algorithm 1. We denote by S t := {X R n n X, Tr(X) = t} the set of all positive semidefinite matrices with trace constraint t. Hence, we have that Problem (1) is equivalent to min f(x), s. t. X S t. (1) By convexity of f, we have the following linearization, for any X, Y S t : f(y ) f(x) (Y X) + f(x). This allows us to define the Wolfe-dual of (1) for any fixed matrix X S t as follows, ω(x) := min f(x) (Y X) + f(x) and the duality gap as = f(x) max f(x) (Y X) g(x) := f(x) ω(x) = max f(x) (Y X). By the definition of the objective function f, the gradient f(x) is always a symmetric matrix and therefore has real eigenvalues, which will be important in the following. Lemma 1. The duality gap can be written as g(x) = t λ max ( f(x)) + f(x) X. Proof. We will prove the claim by showing that for any symmetric matrix G R n n, one can equivalently reformulate the linear optimization problem max Y St G Y as follows: max G Y = max G = max n α i u i u T i i=1 n α i (G u i u T i ), i=1 where the latter maximization is taken over unit vectors u i R n, u i = 1, for 1 i n, and real coefficients α i, with n i=1 α i = t. For Y S t let Y = U T U be its Cholesky factorization. Let α i be the squared norms of the rows of U, and let u i be the row vectors of U, scaled to unit length. From the observation Tr(Y ) = Tr(U T U) = Tr(UU T ) = i α i = t it follows that any Y S t can be written as a convex combination of rank-1 matrices Y = n i=1 α iu i u T i with unit vectors u i R n. It follows max G Y = max = max n α i (G u i u T i ) i=1 n α i u T i Gu i i=1 = t max v R n, v =1 G vvt = t max v R n, v =1 vt Gv = t λ max (G),
7 where the last equality is the variational characterization of the largest eigenvalue. Finally, both claims follow by plugging in f(x) for G. By construction the duality gap g(x) is always an upper bound on the primal error h(x) = f(x) f(x ). This can also be used as a stopping criterion in Algorithm 1. If g(x) ε then f(x) is an ε-approximation to the optimal solution Runtime and Convergence Analysis In this section we will show that after at most O( 1 ε ) iterations Algorithm 1 returns a solution that is an ε-approximation to the global optimal solution. The proof is along the lines of (Clarkson, 28) and (Hazan, 28). However, we improve by lowering the needed accuracy for the eigenvector computation from O(ε 2 ) to O(ε). This in turn lowers the computational effort for a eigenvector computation from O( 1 ε ) to O( 1 ε ) per iteration when using the Lanczos method. Let the curvature constant C f be defined as follows: C f := sup X,Z S t,α [,1] Y =X+α(Z X) 1 (f(y ) f(x) (Y X) f(x)). α2 The curvature constant is a measure of how much the function f(x) deviates from a linear approximation in X, and hence can be seen as an upper bound on the relative Bregman divergence induced by f. Now we can prove the following theorem. Theorem 2. For each i 1, the iterate X i of Algorithm 1 satisfies f(x i ) f(x ) ε, where f(x ) is the optimal value for the minimization Problem (1), and ε = 8C f i+2. Proof. We have X i = V i Vi T. Let the sequence α i = 2 i+2. For each iteration of Hazan s rank-1 update, we have that f(x i+1 ) = min α,β f(α V iv T i + β v i v T i ) f((1 α i ) V i Vi T + α i t v i vi T ) = f(x i + α i (t v i vi T X i )) f(x i ) + α i (t v i vi T X i ) f(x i ) + αi 2 C(11) f where the first inequality follows from choosing α = 1 α i and β = α i t and the last inequality follows from the definition of the curvature constant C f. Furthermore, (t v i v T i X) f(x i ) = (X i t v i v T i ) ( f(x i )) = X i ( f(x i )) t vi T ( f(x i ))v i X i f(x i ) t (λ max ( f(x i )) ε) g(x i ) + t ε g(x i ) + α i C f. The last inequality follows from setting ε to a value at most αi C f t within Algorithm 1. Hence, Inequality (11) evaluates to f(x i+1 ) f(x i ) α i g + α 2 i C f + α 2 i C f = f(x i ) α i g(x i ) + 2α 2 i C f. (12) Subtracting f(x ) on both sides of Inequality (12), and denoting the current primal error by h(x i ) = f(x i ) f(x ), we get h(x i+1 ) h(x i ) α i g(x i ) + 2α 2 i C f, (13) which by using the fact that the duality gap g(x i ) is always an upper bound on the primal error h(x i ) gives h(x i+1 ) h(x i ) α i h(x i ) + 2α 2 i C f. (14) The claim of this theorem is that the primal error h(x i ) = f(x i ) f(x ) is small after a sufficiently large number of iterations. Indeed, we will show by induction that h(x i ) 8C f i+2. In the first iteration (i = ), we know from (14) that the claim holds, because of the choice of α = 1. Assume now that h(x i ) 8C f i+2 holds. Using α i = 2 i+2 in Inequality (14) we can now bound h(x i+1) as follows: h(x i+1 ) h(x i )(1 α i ) + 2αi 2 C f 8C ( f 1 2 ) + 8C f i + 2 i + 2 (i + 2) 2 8C f i + 2 8C f (i + 2) 2 8C f i So far we only considered the progress made by Algorithm 1 trough the rank-1 update. Running the nonlinear improvement on f(v i Vi T ) in each iteration only improves the primal error h(x i ) in each iteration. Hence, the claim of the theorem follows. We can set ε = ε 4t throughout Algorithm 1. This will ensure ε αi C f t as needed by the analysis of the algorithm.
8 5. Discussion We have provided an algorithm that optimizes convex, smooth functions over the cone of positive semidefinite matrices. It can be readily used for a variety of machine learning problems, as many of these fall into this framework or can be equivalently transformed into such a problem. In this paper we have performed experiments on three of such problems and we have shown that our algorithm significantly outperformes state-of-the-art solvers without the need of tuning it to any of the specific tasks. Additionally, the algorithm proposed has the advantage of being very simple, and it comes with the guarantee to always converge to the global optimal solution. In the future, we plan to implement our algorithm in C++ for further speedups. Acknowledgements The author would like to thank Georgiana Dinu and Joachim Giesen for useful discussions on the topic. This work was supported by the Deutsche Forschungsgemeinschaft (DFG) under grant GI-711/3-2. References Arora, Sanjeev, Hazan, Elad, and Kale, Satyen. Fast algorithms for approximate semidefinite programming using the multiplicative weights update method. In FOCS, 25. Burer, Samuel and Monteiro, Renato D.C. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math. Program., 95(2): , 23. Candès, Emmanuel J., Li, Xiaodong, Ma, Yi, and Wright, John. Robust principal component analysis? J. ACM, 58(3):11, 211. Clarkson, Kenneth L. Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm. In SODA, 28. d Aspremont, Alexandre, El Ghaoui, Laurent, Jordan, Michael I., and Lanckriet, Gert R. G. A direct formulation of sparse PCA using semidefinite programming. SIAM Review, 49(3), 27. Garber, Dan and Hazan, Elad. Approximating semidefinite programs in sublinear time. In NIPS, 211. Gillis, Nicolas and Glineur, François. Low-rank matrix approximation with weights or missing data is NP-hard. SIAM J. Matrix Analysis Applications, 32(4): , 211. Goemans, Michel X. and Williamson, David P. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM, 42(6): , Grant, Michael and Boyd, Stephen. CVX: Matlab software for disciplined convex programming, version http: //cvxr.com/cvx, April 211. Grigoriadis, Michael D. and Khachiyan, Leonid G. A sublinear-time randomized approximation algorithm for matrix games. Operations Research Letters, 18(2):53 58, Hazan, Elad. Sparse approximate solutions to semidefinite programs. In LATIN, 28. Kleiner, Ariel, Rahimi, Ali, and Jordan, Michael I. Random conic pursuit for semidefinite programming. In NIPS, 21. Lanckriet, Gert R. G., Cristianini, Nello, Bartlett, Peter, Ghaoui, Laurent El, and Jordan, Michael I. Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5:27 72, December 24. Nemirovski, Arkadi. Prox-method with rate of convergence O(1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 15: , 24. Nesterov, Yurii. Smooth minimization of non-smooth functions. Math. Program., 13: , May 25. Nesterov, Yurii. Smoothing technique and its applications in semidefinite optimization. Math. Program., 11(2): , 27. Obozinski, Guillaume, Taskar, Ben, and Jordan, Michael I. Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing, 2: , April 21. Schmidt, Mark. minfunc. ~mschmidt/software/minfunc.html. Shalev-Shwartz, Shai, Gonen, Alon, and Shamir, Ohad. Large-scale convex minimization with a low-rank constraint. In ICML, 211. Srebro, Nathan, Rennie, Jason D. M., and Jaakkola, Tommi. Maximum-margin matrix factorization. In NIPS, 24. Sturm, Jos F. Using sedumi 1.2, a MATLAB toolbox for optimization over symmetric cones. Optimization Methods and Software, 11 12: , Weinberger, Kilian Q., Sha, Fei, Zhu, Qihui, and Saul, Lawrence K. Graph laplacian regularization for largescale semidefinite programming. In NIPS, 26. Wen, Zaiwen, Goldfarb, Donald, and Yin, Wotao. Alternating direction augmented lagrangian methods for semidefinite programming. Math. Prog. Comp., 2(3-4): 23 23, 21. Xing, Eric P., Ng, Andrew Y., Jordan, Michael I., and Russell, Stuart. Distance metric learning, with application to clustering with side-information. In NIPS, 22.
Adaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
Solving polynomial least squares problems via semidefinite programming relaxations
Solving polynomial least squares problems via semidefinite programming relaxations Sunyoung Kim and Masakazu Kojima August 2007, revised in November, 2007 Abstract. A polynomial optimization problem whose
An Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 borchers@nmt.
An Overview Of Software For Convex Optimization Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM 87801 [email protected] In fact, the great watershed in optimization isn t between linearity
Introduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
Duality of linear conic problems
Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least
Proximal mapping via network optimization
L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:
2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
(Quasi-)Newton methods
(Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting
Projection-free Online Learning
Elad Hazan Technion - Israel Inst. of Tech. Satyen Kale IBM T.J. Watson Research Center [email protected] [email protected] Abstract The computational bottleneck in applying online learning to massive
Large-Scale Similarity and Distance Metric Learning
Large-Scale Similarity and Distance Metric Learning Aurélien Bellet Télécom ParisTech Joint work with K. Liu, Y. Shi and F. Sha (USC), S. Clémençon and I. Colin (Télécom ParisTech) Séminaire Criteo March
Definition 11.1. Given a graph G on n vertices, we define the following quantities:
Lecture 11 The Lovász ϑ Function 11.1 Perfect graphs We begin with some background on perfect graphs. graphs. First, we define some quantities on Definition 11.1. Given a graph G on n vertices, we define
Vector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty
Notes on Symmetric Matrices
CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.
Table 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass.
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
Maximum-Margin Matrix Factorization
Maximum-Margin Matrix Factorization Nathan Srebro Dept. of Computer Science University of Toronto Toronto, ON, CANADA [email protected] Jason D. M. Rennie Tommi S. Jaakkola Computer Science and Artificial
Similarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Primal-Dual methods for sparse constrained matrix completion
Yu Xin MIT CSAIL Tommi Jaakkola MIT CSAIL Abstract We develop scalable algorithms for regular and non-negative matrix completion. In particular, we base the methods on trace-norm regularization that induces
Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.
Summer course on Convex Optimization Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.Minnesota Interior-Point Methods: the rebirth of an old idea Suppose that f is
Inner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
Big Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
Big Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of
NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing
NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing Alex Cloninger Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu
Statistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
10. Proximal point method
L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing
7 Gaussian Elimination and LU Factorization
7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
D-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
GI01/M055 Supervised Learning Proximal Methods
GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators
Support Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
BANACH AND HILBERT SPACE REVIEW
BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but
Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725
Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method
Lecture 3 3B1B Optimization Michaelmas 2015 A. Zisserman Linear Programming Extreme solutions Simplex method Interior point method Integer programming and relaxation The Optimization Tree Linear Programming
Principal components analysis
CS229 Lecture notes Andrew Ng Part XI Principal components analysis In our discussion of factor analysis, we gave a way to model data x R n as approximately lying in some k-dimension subspace, where k
1 Solving LPs: The Simplex Algorithm of George Dantzig
Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.
Several Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
Derivative Free Optimization
Department of Mathematics Derivative Free Optimization M.J.D. Powell LiTH-MAT-R--2014/02--SE Department of Mathematics Linköping University S-581 83 Linköping, Sweden. Three lectures 1 on Derivative Free
Date: April 12, 2001. Contents
2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........
Maximum Margin Clustering
Maximum Margin Clustering Linli Xu James Neufeld Bryce Larson Dale Schuurmans University of Waterloo University of Alberta Abstract We propose a new method for clustering based on finding maximum margin
Support Vector Machine (SVM)
Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
Linear Programming Notes V Problem Transformations
Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material
Nonlinear Optimization: Algorithms 3: Interior-point methods
Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris [email protected] Nonlinear optimization c 2006 Jean-Philippe Vert,
Support Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
Online Convex Optimization
E0 370 Statistical Learning heory Lecture 19 Oct 22, 2013 Online Convex Optimization Lecturer: Shivani Agarwal Scribe: Aadirupa 1 Introduction In this lecture we shall look at a fairly general setting
Semi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION
1 A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION Dimitri Bertsekas M.I.T. FEBRUARY 2003 2 OUTLINE Convexity issues in optimization Historical remarks Our treatment of the subject Three unifying lines of
Machine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
October 3rd, 2012. Linear Algebra & Properties of the Covariance Matrix
Linear Algebra & Properties of the Covariance Matrix October 3rd, 2012 Estimation of r and C Let rn 1, rn, t..., rn T be the historical return rates on the n th asset. rn 1 rṇ 2 r n =. r T n n = 1, 2,...,
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Practical Guide to the Simplex Method of Linear Programming
Practical Guide to the Simplex Method of Linear Programming Marcel Oliver Revised: April, 0 The basic steps of the simplex algorithm Step : Write the linear programming problem in standard form Linear
a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.
Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given
MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).
MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors Jordan canonical form (continued) Jordan canonical form A Jordan block is a square matrix of the form λ 1 0 0 0 0 λ 1 0 0 0 0 λ 0 0 J = 0
17.3.1 Follow the Perturbed Leader
CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.
Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
Introduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
Review Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
Section 6.1 - Inner Products and Norms
Section 6.1 - Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,
APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
Nonlinear Iterative Partial Least Squares Method
Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen
(für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained
Principles of Scientific Computing Nonlinear Equations and Optimization
Principles of Scientific Computing Nonlinear Equations and Optimization David Bindel and Jonathan Goodman last revised March 6, 2006, printed March 6, 2009 1 1 Introduction This chapter discusses two related
Conic optimization: examples and software
Conic optimization: examples and software Etienne de Klerk Tilburg University, The Netherlands Etienne de Klerk (Tilburg University) Conic optimization: examples and software 1 / 16 Outline Conic optimization
MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.
MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α
A Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014
Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview
Massive Data Classification via Unconstrained Support Vector Machines
Massive Data Classification via Unconstrained Support Vector Machines Olvi L. Mangasarian and Michael E. Thompson Computer Sciences Department University of Wisconsin 1210 West Dayton Street Madison, WI
Numerical Methods I Eigenvalue Problems
Numerical Methods I Eigenvalue Problems Aleksandar Donev Courant Institute, NYU 1 [email protected] 1 Course G63.2010.001 / G22.2420-001, Fall 2010 September 30th, 2010 A. Donev (Courant Institute)
Cost Minimization and the Cost Function
Cost Minimization and the Cost Function Juan Manuel Puerta October 5, 2009 So far we focused on profit maximization, we could look at a different problem, that is the cost minimization problem. This is
Orthogonal Diagonalization of Symmetric Matrices
MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding
t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).
1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction
Fast Maximum Margin Matrix Factorization for Collaborative Prediction
for Collaborative Prediction Jason D. M. Rennie [email protected] Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA Nathan Srebro [email protected]
24. The Branch and Bound Method
24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no
5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1
5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 General Integer Linear Program: (ILP) min c T x Ax b x 0 integer Assumption: A, b integer The integrality condition
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
1. Let P be the space of all polynomials (of one real variable and with real coefficients) with the norm
Uppsala Universitet Matematiska Institutionen Andreas Strömbergsson Prov i matematik Funktionalanalys Kurs: F3B, F4Sy, NVP 005-06-15 Skrivtid: 9 14 Tillåtna hjälpmedel: Manuella skrivdon, Kreyszigs bok
Component Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.
Chapter 1 Vocabulary identity - A statement that equates two equivalent expressions. verbal model- A word equation that represents a real-life problem. algebraic expression - An expression with variables.
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem
An Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia [email protected] Tata Institute, Pune,
Methods for Finding Bases
Methods for Finding Bases Bases for the subspaces of a matrix Row-reduction methods can be used to find bases. Let us now look at an example illustrating how to obtain bases for the row space, null space,
Linear Programming. March 14, 2014
Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1
! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.
Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of
Applied Algorithm Design Lecture 5
Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design
Numerical Analysis Lecture Notes
Numerical Analysis Lecture Notes Peter J. Olver 6. Eigenvalues and Singular Values In this section, we collect together the basic facts about eigenvalues and eigenvectors. From a geometrical viewpoint,
Nonlinear Algebraic Equations Example
Nonlinear Algebraic Equations Example Continuous Stirred Tank Reactor (CSTR). Look for steady state concentrations & temperature. s r (in) p,i (in) i In: N spieces with concentrations c, heat capacities
Linear Programming. Widget Factory Example. Linear Programming: Standard Form. Widget Factory Example: Continued.
Linear Programming Widget Factory Example Learning Goals. Introduce Linear Programming Problems. Widget Example, Graphical Solution. Basic Theory:, Vertices, Existence of Solutions. Equivalent formulations.
Metric Spaces. Chapter 7. 7.1. Metrics
Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some
Adaptive Bound Optimization for Online Convex Optimization
Adaptive Bound Optimization for Online Convex Optimization H. Brendan McMahan Google, Inc. [email protected] Matthew Streeter Google, Inc. [email protected] Abstract We introduce a new online convex
Subspace Analysis and Optimization for AAM Based Face Alignment
Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China [email protected] Stan Z. Li Microsoft
