Coordinate Descent for L 1 Minimization
|
|
- Rebecca Riley
- 7 years ago
- Views:
Transcription
1 Coordinate Descent for L 1 Minimization Yingying Li and Stanley Osher Department of Mathematics University of California, Los Angeles Jan 13, 2010 Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 1 / 30
2 Outline Coordinate descent and L 1 minimization problems Application to compressed sensing Application to image denoising Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 2 / 30
3 Basic Idea of Coordinate Descent Consider solving the optimization problem ũ = arg min u R n E(u), u = (u 1, u 2,..., u n ). To solve it by coordinate descent, we update one coordinate while fixing all the others every step and iterate. ũ j = arg min x R E(u 1,..., u j 1, x, u j+1,..., u n ) The way we choose the updated coordinate j is called the sweep pattern, which is essential to the convergence. Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 3 / 30
4 Models Including L 1 Functions We consider energies including an L 1 term n E(u 1, u 2,..., u n ) = F (u 1, u 2,..., u n ) + u i, (1) i=1 and also energies with a total variation (TV) term n E(u 1, u 2,..., u n ) = F (u 1, u 2,..., u n ) + u i, (2) where F is convex and differentiable and is a discrete approximation of gradient. i=1 Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 4 / 30
5 Models Including L 1 Functions Compressed Sensing u = arg min u 1 s.t. Au = f (3) TV-base image processing, like ROF (Rudin-Osher-Fatemi) denoising and nonlocal ROF denoising (Guy Gilboa and Stanley Osher). u = arg min u TV + λ u f 2 2 (4) u = arg min u NL-TV + λ u f 2 2 (5) where λ is a scalar parameter and f is input noisy image. Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 5 / 30
6 Why Coordinate Descent? Advantages Scalar minimization is easier than multivariable minimization Gradient free Easy implementation Disadvantages Hard to take advantage of structured relationships among coordinates (like a problem involving the FFT matrix) The convergence rate is not good; sometimes it gets stuck at non-optimal points (for example, fused lasso E(u) = Au f λ 1 u 1 + λ 2 u TV ) Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 6 / 30
7 Coordinate Descent Sweep Patterns Sweep Patterns Sequential: 1, 2,..., n, 1, 2,..., n,... or 1, 2,..., n, n 1, n 2,..., 1,... Random: permutation of 1, 2,..., n, repeat Essentially Cyclic Rule Suppose the energy is convex and the nondifferentiable parts of the energy function are separable. Then coordinate descent converges if the sweep pattern satisfies the essentially cyclic rule, that every coordinate is visited infinitely often. However, no theoretical results for the convergence rate. Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 7 / 30
8 Compressed Sensing (CS) min E(u) = u 0, subject to Au = f. u R n If the matrix A is incoherent, then it is equivalent to its convex relaxation: min u 1, subject to Au = f. u R n Restricted Isometry Condition (RIC): A measurement matrix A satisfies RIC with parameters (s, δ) for δ (0, 1) if we have (1 δ) v 2 Av 2 (1 + δ) v 2 for all s-sparse vectors. When δ is small, the RIC says that every set of s columns of A is approximately an orthonormal system. Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 8 / 30
9 Bregman Iterative Method The constrained problem: The unconstrained problem: min u u R n 1 subject to Au = f. min E(u) = u u R n 1 + λ Au f 2 2, Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 9 / 30
10 Bregman Iterative Algorithm 1: Initialize: k = 0, u 0 = 0, f 0 = f. 2: while Au f 2 f 2 > tolerance do 3: Solve u k+1 arg min u u 1 + λ Au f k 2 2 by coordinate descent 4: f k+1 f + (f k Au k+1 ) 5: k k + 1 6: end while Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 10 / 30
11 Solving the Coordinate Subproblem min E(u) = λ u j ( p = λ ( m n ) 2 a ij u j f i + i=1 ) j=1 n u i i=1 aij 2 uj 2 2 a ij (f i a ik u k ) u j + i=1 i k j = λ ( a j 2 2uj 2 2β j u j + f 2 ) 2 + uj + u i, p fi 2 i=1 where β j = i a ij(f i k j a iku k ). The optimal value for u j is i j ũ j = 1 a j 2 shrink ( β j, 2λ) n u i So this formula corrects the jth component of u, which strictly decreases the energy function E(u). Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 11 / 30 i=1
12 Pathwise Coordinate Descent Algorithm (Pathwise Coordinate Descent): While not converge For i = 1, 2,..., n, β j i a ij(f i k j a iku k ) u j 1 shrink ( β a j 2 j, 1 ) 2λ 2 where the shrink operator is: f µ, if f > µ; shrink(f, µ) = 0, if µ f µ; f + µ, if f < µ. shrink(f, µ) µ µ f Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 12 / 30
13 Adaptive Greedy Sweep Consider the energy decrease by only updating the jth coordinate at once: E j = E(u 1,..., u j,..., u n ) E(u 1,..., ũ j,..., u n ) ( u j β ) 2 ( j + u j λ a j 2 2 ũ j = λ a j 2 2 = λ a j 2 2(u j ũ j ) a j 2 2 ( u j + ũ j 2β j a j 2 2 β j a j 2 2 ) + u j ũ j. ) 2 ũ j We select the coordinate to update so as to maximize the decrease in energy, j = arg max E j. j Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 13 / 30
14 Greedy is Better After one sequential sweep zoom in adaptive sweeps Figure: The dots represent the exact solution of the unconstrained problem at λ = 1000 and the circles are the results after 512 iterations. The left figure shows the result after one sequential sweep (512 iterations), the middle one is its zoom-in version and the right one shows the result by using an adaptive sweep. Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 14 / 30
15 Gradient-based Adaptive Sweep For the same unconstrained problem, Tongtong Wu and Kenneth Lange update u j giving the most negative value of directional derivatives along each forward or backward directions. For instance, if e k is the coordinate direction along with u k varies, the the objective function E(u) has directional derivatives: E(u + σe k ) E(u) d ek E = lim = 2λ( a k 2 σ 0 σ 2u k β k ) + { E(u σe k ) E(u) d ek E = lim = 2λ( a k 2 σ 0 σ 2u k β k )+ We choose the updating coordinate j = arg min {d ej E, d ej E}. j { 1, u k 0; 1, u k < 0. 1, u k > 0; 1, u k 0. Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 15 / 30
16 Update Difference Based We use the following way for choosing the updating coordinate based on the update difference: j = arg max j u j ũ j. Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 16 / 30
17 Greedy Selection Choices Energy function based: j = arg max i E i ; Gradient-based: j = arg min i {d ei E, d ei E}; Update difference based: j = arg max i u i ũ i Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 17 / 30
18 Computational Tricks Our algorithm involves a vector β, β j = (A T f ) j a T j Au + a j 2 2u j. Since u changes in only one coordinate each iteration, there is a computationally efficient way to update β without entirely recomputing it, β k+1 β k = (u k+1 p u k p )( a p 2 2I A T A)e p. Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 18 / 30
19 Algorithm, Coordinate Descent with a Refined Sweep Precompute: w j = a j 2 2 ; Normalization: A(, i) = A(, i)/w i ; Initialization: u = 0, β = A T f ; Iterate until convergence: ũ = shrink(β, 1 2λ ); j = arg max i u i ũ i, u k+1 j = ũ j ; β k+1 = β k u k j ũ j (A T A)e j, β k+1 j = β k j. Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 19 / 30
20 Convergence of Greedy Coordinate Descent Consider minimizing functions of the form min H(u 1, u 2,..., u n ) = F (u 1, u 2,..., u n ) + u i R n g i (u i ), where F is differentiable and convex, and g i is convex. Tseng (1988) proved pathwise coordinate descent converges to a minimizer of H. Theorem If lim ui H(u 1, u 2,..., u n ) = for any i, and 2 F u i u j M, then the greedy coordinate descent method based on the selection rule: j = arg max i H converges to an optimal solution. i=1 Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 20 / 30
21 Convergence of Greedy Coordinate Descent Lemma Consider function H(x) = (ax b) 2 + x, where a > 0. It is easy to know that x = arg min x H(x) = shrink( b a, 1 2a 2 ). Now we claim that for any x, x x 1 a( H(x) H( x) ) 1/2. Theorem The greedy coordinate descent method with the selection rule: j = arg max i u i ũ i converges to an optimal solution of the following problem min H(u) = u u R n 1 + λ Au f 2 2, Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 21 / 30
22 Comparison of Numerical Speed min u u R n 1 + λ Au f 2 2. (6) λ Pathwise (1) (2) (3) Table: Runtime in seconds of pathwise coordinate descent and adaptive sweep methods (1), (2), (3) for solving (6) when they achieve the same accuracy. Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 22 / 30
23 Another Application to Image Denoising Exact Noisy Denoised Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 23 / 30
24 Solving ROF The ROF (Rudin-Osher-Fatemi) model for image denoising is the following: find u satisfying u = arg min u dx + λ u f u Ω 2 2 where f is the given noisy image and λ > 0 is a parameter. Methods for solving ROF: Gradient descent (slow, time step restriction) Graph cuts (fast, but does not parallelize) Coordinate descent Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 24 / 30
25 Solving in Parallel by Coordinate Descent min E(u) = u u l + u u r + u u u + u u d + λ(u f ) 2 (7) u R { u opt = median u l, u r, u u, u d, f + 2 λ, f + 1 λ, f, f 1 λ, f 2 } λ (8) u u u l u u r u d While not converged, apply (8) to pixels in the black pattern in parallel ; apply (8) to pixels in the white pattern in parallel. Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 25 / 30
26 Numerical Results Input: SNR 8.8 SNR 13.2 (1.10s) SNR 13.5 (1.15s) Input: SNR 3.1 SNR 9.5 (1.30s) SNR 10.3 (1.27s) Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 26 / 30
27 Nonlocal Means (Buades, Coll and Morel) Every two pixels x and y have a weight w(x, y) used to evaluate the similarity of their patches. Define w(x, y) = exp ( 1 ) h 2 G a (t) f (x + t) f (y + t) 2 dt, (9) Ω where G a is a Gaussian with standard deviation a. For computational efficiency, the support of w(x, y) is often restricted to a search window x y R and set to zero outside. Here, we consider the Nonlocal-ROF model, arg min E(u, f ) = w(x, y) u(x) u(y) dx dy + λ u (u f ) 2 dx. (10) Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 27 / 30
28 Generalization of the Median Formula arg min x n w i x u i + λ x f α i=1 = median { u 1, u 2,..., u n, f + w n + + w 1 p µ, f + sign(w n + + w 2 w 1 ) w n + + w 2 w 1 p µ, f + sign(w n + + w 3 w 2 w 1 ) w n + + w 3 w 2 w 1 p µ,..., f + sign(w n w n 1 w 1 ) w n w n 1 w 1 p µ, f w n + w n w 1 p µ }, (11) where p = 1 α 1 and µ = ( 1 αλ )p. A computational savings is that the p i do not depend on the u i, so with all else fixed they only need to be computed once. Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 28 / 30
29 Numerical Results 5 2 search window(snr 10.4) 7 2 (SNR 10.8) 11 2 (SNR 10.9) The runtimes are 14s with the 5 5 search window and 20s with the 7 7 search window and 34s with search window. (Parameters: λ = , a = 1.25, h = 10.2.) Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 29 / 30
30 Thanks for your attention! Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 30 / 30
Variational approach to restore point-like and curve-like singularities in imaging
Variational approach to restore point-like and curve-like singularities in imaging Daniele Graziani joint work with Gilles Aubert and Laure Blanc-Féraud Roma 12/06/2012 Daniele Graziani (Roma) 12/06/2012
More informationDual Methods for Total Variation-Based Image Restoration
Dual Methods for Total Variation-Based Image Restoration Jamylle Carter Institute for Mathematics and its Applications University of Minnesota, Twin Cities Ph.D. (Mathematics), UCLA, 2001 Advisor: Tony
More informationThe Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method
The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem
More informationLecture 6: Logistic Regression
Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,
More informationNMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing
NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing Alex Cloninger Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu
More information7 Gaussian Elimination and LU Factorization
7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard
More informationMathematical finance and linear programming (optimization)
Mathematical finance and linear programming (optimization) Geir Dahl September 15, 2009 1 Introduction The purpose of this short note is to explain how linear programming (LP) (=linear optimization) may
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationv w is orthogonal to both v and w. the three vectors v, w and v w form a right-handed set of vectors.
3. Cross product Definition 3.1. Let v and w be two vectors in R 3. The cross product of v and w, denoted v w, is the vector defined as follows: the length of v w is the area of the parallelogram with
More informationGeneral Framework for an Iterative Solution of Ax b. Jacobi s Method
2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More informationSMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA)
SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI (B.Sc.(Hons.), BUAA) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF MATHEMATICS NATIONAL
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationHow To Solve The Cluster Algorithm
Cluster Algorithms Adriano Cruz adriano@nce.ufrj.br 28 de outubro de 2013 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 1 / 80 Summary 1 K-Means Adriano Cruz adriano@nce.ufrj.br
More informationLinear Programming. March 14, 2014
Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1
More informationLinear Programming Notes V Problem Transformations
Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationBinary Image Reconstruction
A network flow algorithm for reconstructing binary images from discrete X-rays Kees Joost Batenburg Leiden University and CWI, The Netherlands kbatenbu@math.leidenuniv.nl Abstract We present a new algorithm
More informationVector Math Computer Graphics Scott D. Anderson
Vector Math Computer Graphics Scott D. Anderson 1 Dot Product The notation v w means the dot product or scalar product or inner product of two vectors, v and w. In abstract mathematics, we can talk about
More informationNOTES ON LINEAR TRANSFORMATIONS
NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all
More information1 Solving LPs: The Simplex Algorithm of George Dantzig
Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.
More informationBig Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
More informationPerron vector Optimization applied to search engines
Perron vector Optimization applied to search engines Olivier Fercoq INRIA Saclay and CMAP Ecole Polytechnique May 18, 2011 Web page ranking The core of search engines Semantic rankings (keywords) Hyperlink
More informationIntroduction to Online Learning Theory
Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent
More informationNumerical Methods For Image Restoration
Numerical Methods For Image Restoration CIRAM Alessandro Lanza University of Bologna, Italy Faculty of Engineering CIRAM Outline 1. Image Restoration as an inverse problem 2. Image degradation models:
More informationScalar Valued Functions of Several Variables; the Gradient Vector
Scalar Valued Functions of Several Variables; the Gradient Vector Scalar Valued Functions vector valued function of n variables: Let us consider a scalar (i.e., numerical, rather than y = φ(x = φ(x 1,
More information2.2 Creaseness operator
2.2. Creaseness operator 31 2.2 Creaseness operator Antonio López, a member of our group, has studied for his PhD dissertation the differential operators described in this section [72]. He has compared
More information1 Error in Euler s Method
1 Error in Euler s Method Experience with Euler s 1 method raises some interesting questions about numerical approximations for the solutions of differential equations. 1. What determines the amount of
More informationNumerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen
(für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More information17.3.1 Follow the Perturbed Leader
CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.
More information3. INNER PRODUCT SPACES
. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.
More informationPart II Redundant Dictionaries and Pursuit Algorithms
Aisenstadt Chair Course CRM September 2009 Part II Redundant Dictionaries and Pursuit Algorithms Stéphane Mallat Centre de Mathématiques Appliquées Ecole Polytechnique Sparsity in Redundant Dictionaries
More informationMicroeconomic Theory: Basic Math Concepts
Microeconomic Theory: Basic Math Concepts Matt Van Essen University of Alabama Van Essen (U of A) Basic Math Concepts 1 / 66 Basic Math Concepts In this lecture we will review some basic mathematical concepts
More informationLecture 1: Schur s Unitary Triangularization Theorem
Lecture 1: Schur s Unitary Triangularization Theorem This lecture introduces the notion of unitary equivalence and presents Schur s theorem and some of its consequences It roughly corresponds to Sections
More informationOptimization Modeling for Mining Engineers
Optimization Modeling for Mining Engineers Alexandra M. Newman Division of Economics and Business Slide 1 Colorado School of Mines Seminar Outline Linear Programming Integer Linear Programming Slide 2
More information1 Determinants and the Solvability of Linear Systems
1 Determinants and the Solvability of Linear Systems In the last section we learned how to use Gaussian elimination to solve linear systems of n equations in n unknowns The section completely side-stepped
More informationA Network Flow Approach in Cloud Computing
1 A Network Flow Approach in Cloud Computing Soheil Feizi, Amy Zhang, Muriel Médard RLE at MIT Abstract In this paper, by using network flow principles, we propose algorithms to address various challenges
More informationLogistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.
Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features
More informationHøgskolen i Narvik Sivilingeniørutdanningen STE6237 ELEMENTMETODER. Oppgaver
Høgskolen i Narvik Sivilingeniørutdanningen STE637 ELEMENTMETODER Oppgaver Klasse: 4.ID, 4.IT Ekstern Professor: Gregory A. Chechkin e-mail: chechkin@mech.math.msu.su Narvik 6 PART I Task. Consider two-point
More informationTo give it a definition, an implicit function of x and y is simply any relationship that takes the form:
2 Implicit function theorems and applications 21 Implicit functions The implicit function theorem is one of the most useful single tools you ll meet this year After a while, it will be second nature to
More information1 Norms and Vector Spaces
008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)
More information17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function
17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function, : V V R, which is symmetric, that is u, v = v, u. bilinear, that is linear (in both factors):
More informationLecture Topic: Low-Rank Approximations
Lecture Topic: Low-Rank Approximations Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original
More informationNotes on Symmetric Matrices
CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.
More informationMATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform
MATH 433/533, Fourier Analysis Section 11, The Discrete Fourier Transform Now, instead of considering functions defined on a continuous domain, like the interval [, 1) or the whole real line R, we wish
More informationIntroduction to the Finite Element Method
Introduction to the Finite Element Method 09.06.2009 Outline Motivation Partial Differential Equations (PDEs) Finite Difference Method (FDM) Finite Element Method (FEM) References Motivation Figure: cross
More informationLABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
More informationGenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1
(R) User Manual Environmental Energy Technologies Division Berkeley, CA 94720 http://simulationresearch.lbl.gov Michael Wetter MWetter@lbl.gov February 20, 2009 Notice: This work was supported by the U.S.
More informationSOLVING LINEAR SYSTEMS
SOLVING LINEAR SYSTEMS Linear systems Ax = b occur widely in applied mathematics They occur as direct formulations of real world problems; but more often, they occur as a part of the numerical analysis
More informationSublinear Algorithms for Big Data. Part 4: Random Topics
Sublinear Algorithms for Big Data Part 4: Random Topics Qin Zhang 1-1 2-1 Topic 1: Compressive sensing Compressive sensing The model (Candes-Romberg-Tao 04; Donoho 04) Applicaitons Medical imaging reconstruction
More informationNimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff
Nimble Algorithms for Cloud Computing Ravi Kannan, Santosh Vempala and David Woodruff Cloud computing Data is distributed arbitrarily on many servers Parallel algorithms: time Streaming algorithms: sublinear
More informationBindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8
Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e
More informationα = u v. In other words, Orthogonal Projection
Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v
More informationSTORM: Stochastic Optimization Using Random Models Katya Scheinberg Lehigh University. (Joint work with R. Chen and M. Menickelly)
STORM: Stochastic Optimization Using Random Models Katya Scheinberg Lehigh University (Joint work with R. Chen and M. Menickelly) Outline Stochastic optimization problem black box gradient based Existing
More informationDate: April 12, 2001. Contents
2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........
More informationCITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION
No: CITY UNIVERSITY LONDON BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION ENGINEERING MATHEMATICS 2 (resit) EX2005 Date: August
More information2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system
1. Systems of linear equations We are interested in the solutions to systems of linear equations. A linear equation is of the form 3x 5y + 2z + w = 3. The key thing is that we don t multiply the variables
More informationx1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.
Cross product 1 Chapter 7 Cross product We are getting ready to study integration in several variables. Until now we have been doing only differential calculus. One outcome of this study will be our ability
More informationImplementation of Canny Edge Detector of color images on CELL/B.E. Architecture.
Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture. Chirag Gupta,Sumod Mohan K cgupta@clemson.edu, sumodm@clemson.edu Abstract In this project we propose a method to improve
More informationBilinear Prediction Using Low-Rank Models
Bilinear Prediction Using Low-Rank Models Inderjit S. Dhillon Dept of Computer Science UT Austin 26th International Conference on Algorithmic Learning Theory Banff, Canada Oct 6, 2015 Joint work with C-J.
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More information5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1
5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 General Integer Linear Program: (ILP) min c T x Ax b x 0 integer Assumption: A, b integer The integrality condition
More information1 The Brownian bridge construction
The Brownian bridge construction The Brownian bridge construction is a way to build a Brownian motion path by successively adding finer scale detail. This construction leads to a relatively easy proof
More informationInner products on R n, and more
Inner products on R n, and more Peyam Ryan Tabrizian Friday, April 12th, 2013 1 Introduction You might be wondering: Are there inner products on R n that are not the usual dot product x y = x 1 y 1 + +
More informationWalrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.
Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian
More informationLinear Programming. April 12, 2005
Linear Programming April 1, 005 Parts of this were adapted from Chapter 9 of i Introduction to Algorithms (Second Edition) /i by Cormen, Leiserson, Rivest and Stein. 1 What is linear programming? The first
More informationChapter 4: Artificial Neural Networks
Chapter 4: Artificial Neural Networks CS 536: Machine Learning Littman (Wu, TA) Administration icml-03: instructional Conference on Machine Learning http://www.cs.rutgers.edu/~mlittman/courses/ml03/icml03/
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More information(Quasi-)Newton methods
(Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting
More informationThe Heat Equation. Lectures INF2320 p. 1/88
The Heat Equation Lectures INF232 p. 1/88 Lectures INF232 p. 2/88 The Heat Equation We study the heat equation: u t = u xx for x (,1), t >, (1) u(,t) = u(1,t) = for t >, (2) u(x,) = f(x) for x (,1), (3)
More informationFactorization Theorems
Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization
More informationSimilarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
More informationOptimization in R n Introduction
Optimization in R n Introduction Rudi Pendavingh Eindhoven Technical University Optimization in R n, lecture Rudi Pendavingh (TUE) Optimization in R n Introduction ORN / 4 Some optimization problems designing
More information3 Contour integrals and Cauchy s Theorem
3 ontour integrals and auchy s Theorem 3. Line integrals of complex functions Our goal here will be to discuss integration of complex functions = u + iv, with particular regard to analytic functions. Of
More information2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
More informationSolutions to Math 51 First Exam January 29, 2015
Solutions to Math 5 First Exam January 29, 25. ( points) (a) Complete the following sentence: A set of vectors {v,..., v k } is defined to be linearly dependent if (2 points) there exist c,... c k R, not
More informationRecovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach
MASTER S THESIS Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach PAULINE ALDENVIK MIRJAM SCHIERSCHER Department of Mathematical
More informationSemi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
More informationNonlinear Algebraic Equations Example
Nonlinear Algebraic Equations Example Continuous Stirred Tank Reactor (CSTR). Look for steady state concentrations & temperature. s r (in) p,i (in) i In: N spieces with concentrations c, heat capacities
More informationLecture 2: August 29. Linear Programming (part I)
10-725: Convex Optimization Fall 2013 Lecture 2: August 29 Lecturer: Barnabás Póczos Scribes: Samrachana Adhikari, Mattia Ciollaro, Fabrizio Lecci Note: LaTeX template courtesy of UC Berkeley EECS dept.
More informationAdaBoost. Jiri Matas and Jan Šochman. Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz
AdaBoost Jiri Matas and Jan Šochman Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Presentation Outline: AdaBoost algorithm Why is of interest? How it works? Why
More informationNumerical methods for American options
Lecture 9 Numerical methods for American options Lecture Notes by Andrzej Palczewski Computational Finance p. 1 American options The holder of an American option has the right to exercise it at any moment
More informationSOLUTIONS. f x = 6x 2 6xy 24x, f y = 3x 2 6y. To find the critical points, we solve
SOLUTIONS Problem. Find the critical points of the function f(x, y = 2x 3 3x 2 y 2x 2 3y 2 and determine their type i.e. local min/local max/saddle point. Are there any global min/max? Partial derivatives
More information5.1 Bipartite Matching
CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the Ford-Fulkerson
More informationChapter 20. Vector Spaces and Bases
Chapter 20. Vector Spaces and Bases In this course, we have proceeded step-by-step through low-dimensional Linear Algebra. We have looked at lines, planes, hyperplanes, and have seen that there is no limit
More informationGI01/M055 Supervised Learning Proximal Methods
GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators
More informationThe Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy
BMI Paper The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy Faculty of Sciences VU University Amsterdam De Boelelaan 1081 1081 HV Amsterdam Netherlands Author: R.D.R.
More informationPrinciples of demand management Airline yield management Determining the booking limits. » A simple problem» Stochastic gradients for general problems
Demand Management Principles of demand management Airline yield management Determining the booking limits» A simple problem» Stochastic gradients for general problems Principles of demand management Issues:»
More informationMAT 200, Midterm Exam Solution. a. (5 points) Compute the determinant of the matrix A =
MAT 200, Midterm Exam Solution. (0 points total) a. (5 points) Compute the determinant of the matrix 2 2 0 A = 0 3 0 3 0 Answer: det A = 3. The most efficient way is to develop the determinant along the
More informationSystems of Linear Equations
Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and
More informationA central problem in network revenue management
A Randomized Linear Programming Method for Computing Network Bid Prices KALYAN TALLURI Universitat Pompeu Fabra, Barcelona, Spain GARRETT VAN RYZIN Columbia University, New York, New York We analyze a
More informationLectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain
Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain 1. Orthogonal matrices and orthonormal sets An n n real-valued matrix A is said to be an orthogonal
More informationMATH1231 Algebra, 2015 Chapter 7: Linear maps
MATH1231 Algebra, 2015 Chapter 7: Linear maps A/Prof. Daniel Chan School of Mathematics and Statistics University of New South Wales danielc@unsw.edu.au Daniel Chan (UNSW) MATH1231 Algebra 1 / 43 Chapter
More informationParallel Selective Algorithms for Nonconvex Big Data Optimization
1874 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 63, NO. 7, APRIL 1, 2015 Parallel Selective Algorithms for Nonconvex Big Data Optimization Francisco Facchinei, Gesualdo Scutari, Senior Member, IEEE,
More informationMulti-variable Calculus and Optimization
Multi-variable Calculus and Optimization Dudley Cooke Trinity College Dublin Dudley Cooke (Trinity College Dublin) Multi-variable Calculus and Optimization 1 / 51 EC2040 Topic 3 - Multi-variable Calculus
More informationPrinciples of Scientific Computing Nonlinear Equations and Optimization
Principles of Scientific Computing Nonlinear Equations and Optimization David Bindel and Jonathan Goodman last revised March 6, 2006, printed March 6, 2009 1 1 Introduction This chapter discusses two related
More informationApplications to Data Smoothing and Image Processing I
Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is
More informationA linear algebraic method for pricing temporary life annuities
A linear algebraic method for pricing temporary life annuities P. Date (joint work with R. Mamon, L. Jalen and I.C. Wang) Department of Mathematical Sciences, Brunel University, London Outline Introduction
More information