Coordinate Descent for L 1 Minimization

Size: px
Start display at page:

Download "Coordinate Descent for L 1 Minimization"

Transcription

1 Coordinate Descent for L 1 Minimization Yingying Li and Stanley Osher Department of Mathematics University of California, Los Angeles Jan 13, 2010 Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 1 / 30

2 Outline Coordinate descent and L 1 minimization problems Application to compressed sensing Application to image denoising Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 2 / 30

3 Basic Idea of Coordinate Descent Consider solving the optimization problem ũ = arg min u R n E(u), u = (u 1, u 2,..., u n ). To solve it by coordinate descent, we update one coordinate while fixing all the others every step and iterate. ũ j = arg min x R E(u 1,..., u j 1, x, u j+1,..., u n ) The way we choose the updated coordinate j is called the sweep pattern, which is essential to the convergence. Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 3 / 30

4 Models Including L 1 Functions We consider energies including an L 1 term n E(u 1, u 2,..., u n ) = F (u 1, u 2,..., u n ) + u i, (1) i=1 and also energies with a total variation (TV) term n E(u 1, u 2,..., u n ) = F (u 1, u 2,..., u n ) + u i, (2) where F is convex and differentiable and is a discrete approximation of gradient. i=1 Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 4 / 30

5 Models Including L 1 Functions Compressed Sensing u = arg min u 1 s.t. Au = f (3) TV-base image processing, like ROF (Rudin-Osher-Fatemi) denoising and nonlocal ROF denoising (Guy Gilboa and Stanley Osher). u = arg min u TV + λ u f 2 2 (4) u = arg min u NL-TV + λ u f 2 2 (5) where λ is a scalar parameter and f is input noisy image. Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 5 / 30

6 Why Coordinate Descent? Advantages Scalar minimization is easier than multivariable minimization Gradient free Easy implementation Disadvantages Hard to take advantage of structured relationships among coordinates (like a problem involving the FFT matrix) The convergence rate is not good; sometimes it gets stuck at non-optimal points (for example, fused lasso E(u) = Au f λ 1 u 1 + λ 2 u TV ) Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 6 / 30

7 Coordinate Descent Sweep Patterns Sweep Patterns Sequential: 1, 2,..., n, 1, 2,..., n,... or 1, 2,..., n, n 1, n 2,..., 1,... Random: permutation of 1, 2,..., n, repeat Essentially Cyclic Rule Suppose the energy is convex and the nondifferentiable parts of the energy function are separable. Then coordinate descent converges if the sweep pattern satisfies the essentially cyclic rule, that every coordinate is visited infinitely often. However, no theoretical results for the convergence rate. Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 7 / 30

8 Compressed Sensing (CS) min E(u) = u 0, subject to Au = f. u R n If the matrix A is incoherent, then it is equivalent to its convex relaxation: min u 1, subject to Au = f. u R n Restricted Isometry Condition (RIC): A measurement matrix A satisfies RIC with parameters (s, δ) for δ (0, 1) if we have (1 δ) v 2 Av 2 (1 + δ) v 2 for all s-sparse vectors. When δ is small, the RIC says that every set of s columns of A is approximately an orthonormal system. Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 8 / 30

9 Bregman Iterative Method The constrained problem: The unconstrained problem: min u u R n 1 subject to Au = f. min E(u) = u u R n 1 + λ Au f 2 2, Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 9 / 30

10 Bregman Iterative Algorithm 1: Initialize: k = 0, u 0 = 0, f 0 = f. 2: while Au f 2 f 2 > tolerance do 3: Solve u k+1 arg min u u 1 + λ Au f k 2 2 by coordinate descent 4: f k+1 f + (f k Au k+1 ) 5: k k + 1 6: end while Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 10 / 30

11 Solving the Coordinate Subproblem min E(u) = λ u j ( p = λ ( m n ) 2 a ij u j f i + i=1 ) j=1 n u i i=1 aij 2 uj 2 2 a ij (f i a ik u k ) u j + i=1 i k j = λ ( a j 2 2uj 2 2β j u j + f 2 ) 2 + uj + u i, p fi 2 i=1 where β j = i a ij(f i k j a iku k ). The optimal value for u j is i j ũ j = 1 a j 2 shrink ( β j, 2λ) n u i So this formula corrects the jth component of u, which strictly decreases the energy function E(u). Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 11 / 30 i=1

12 Pathwise Coordinate Descent Algorithm (Pathwise Coordinate Descent): While not converge For i = 1, 2,..., n, β j i a ij(f i k j a iku k ) u j 1 shrink ( β a j 2 j, 1 ) 2λ 2 where the shrink operator is: f µ, if f > µ; shrink(f, µ) = 0, if µ f µ; f + µ, if f < µ. shrink(f, µ) µ µ f Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 12 / 30

13 Adaptive Greedy Sweep Consider the energy decrease by only updating the jth coordinate at once: E j = E(u 1,..., u j,..., u n ) E(u 1,..., ũ j,..., u n ) ( u j β ) 2 ( j + u j λ a j 2 2 ũ j = λ a j 2 2 = λ a j 2 2(u j ũ j ) a j 2 2 ( u j + ũ j 2β j a j 2 2 β j a j 2 2 ) + u j ũ j. ) 2 ũ j We select the coordinate to update so as to maximize the decrease in energy, j = arg max E j. j Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 13 / 30

14 Greedy is Better After one sequential sweep zoom in adaptive sweeps Figure: The dots represent the exact solution of the unconstrained problem at λ = 1000 and the circles are the results after 512 iterations. The left figure shows the result after one sequential sweep (512 iterations), the middle one is its zoom-in version and the right one shows the result by using an adaptive sweep. Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 14 / 30

15 Gradient-based Adaptive Sweep For the same unconstrained problem, Tongtong Wu and Kenneth Lange update u j giving the most negative value of directional derivatives along each forward or backward directions. For instance, if e k is the coordinate direction along with u k varies, the the objective function E(u) has directional derivatives: E(u + σe k ) E(u) d ek E = lim = 2λ( a k 2 σ 0 σ 2u k β k ) + { E(u σe k ) E(u) d ek E = lim = 2λ( a k 2 σ 0 σ 2u k β k )+ We choose the updating coordinate j = arg min {d ej E, d ej E}. j { 1, u k 0; 1, u k < 0. 1, u k > 0; 1, u k 0. Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 15 / 30

16 Update Difference Based We use the following way for choosing the updating coordinate based on the update difference: j = arg max j u j ũ j. Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 16 / 30

17 Greedy Selection Choices Energy function based: j = arg max i E i ; Gradient-based: j = arg min i {d ei E, d ei E}; Update difference based: j = arg max i u i ũ i Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 17 / 30

18 Computational Tricks Our algorithm involves a vector β, β j = (A T f ) j a T j Au + a j 2 2u j. Since u changes in only one coordinate each iteration, there is a computationally efficient way to update β without entirely recomputing it, β k+1 β k = (u k+1 p u k p )( a p 2 2I A T A)e p. Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 18 / 30

19 Algorithm, Coordinate Descent with a Refined Sweep Precompute: w j = a j 2 2 ; Normalization: A(, i) = A(, i)/w i ; Initialization: u = 0, β = A T f ; Iterate until convergence: ũ = shrink(β, 1 2λ ); j = arg max i u i ũ i, u k+1 j = ũ j ; β k+1 = β k u k j ũ j (A T A)e j, β k+1 j = β k j. Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 19 / 30

20 Convergence of Greedy Coordinate Descent Consider minimizing functions of the form min H(u 1, u 2,..., u n ) = F (u 1, u 2,..., u n ) + u i R n g i (u i ), where F is differentiable and convex, and g i is convex. Tseng (1988) proved pathwise coordinate descent converges to a minimizer of H. Theorem If lim ui H(u 1, u 2,..., u n ) = for any i, and 2 F u i u j M, then the greedy coordinate descent method based on the selection rule: j = arg max i H converges to an optimal solution. i=1 Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 20 / 30

21 Convergence of Greedy Coordinate Descent Lemma Consider function H(x) = (ax b) 2 + x, where a > 0. It is easy to know that x = arg min x H(x) = shrink( b a, 1 2a 2 ). Now we claim that for any x, x x 1 a( H(x) H( x) ) 1/2. Theorem The greedy coordinate descent method with the selection rule: j = arg max i u i ũ i converges to an optimal solution of the following problem min H(u) = u u R n 1 + λ Au f 2 2, Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 21 / 30

22 Comparison of Numerical Speed min u u R n 1 + λ Au f 2 2. (6) λ Pathwise (1) (2) (3) Table: Runtime in seconds of pathwise coordinate descent and adaptive sweep methods (1), (2), (3) for solving (6) when they achieve the same accuracy. Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 22 / 30

23 Another Application to Image Denoising Exact Noisy Denoised Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 23 / 30

24 Solving ROF The ROF (Rudin-Osher-Fatemi) model for image denoising is the following: find u satisfying u = arg min u dx + λ u f u Ω 2 2 where f is the given noisy image and λ > 0 is a parameter. Methods for solving ROF: Gradient descent (slow, time step restriction) Graph cuts (fast, but does not parallelize) Coordinate descent Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 24 / 30

25 Solving in Parallel by Coordinate Descent min E(u) = u u l + u u r + u u u + u u d + λ(u f ) 2 (7) u R { u opt = median u l, u r, u u, u d, f + 2 λ, f + 1 λ, f, f 1 λ, f 2 } λ (8) u u u l u u r u d While not converged, apply (8) to pixels in the black pattern in parallel ; apply (8) to pixels in the white pattern in parallel. Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 25 / 30

26 Numerical Results Input: SNR 8.8 SNR 13.2 (1.10s) SNR 13.5 (1.15s) Input: SNR 3.1 SNR 9.5 (1.30s) SNR 10.3 (1.27s) Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 26 / 30

27 Nonlocal Means (Buades, Coll and Morel) Every two pixels x and y have a weight w(x, y) used to evaluate the similarity of their patches. Define w(x, y) = exp ( 1 ) h 2 G a (t) f (x + t) f (y + t) 2 dt, (9) Ω where G a is a Gaussian with standard deviation a. For computational efficiency, the support of w(x, y) is often restricted to a search window x y R and set to zero outside. Here, we consider the Nonlocal-ROF model, arg min E(u, f ) = w(x, y) u(x) u(y) dx dy + λ u (u f ) 2 dx. (10) Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 27 / 30

28 Generalization of the Median Formula arg min x n w i x u i + λ x f α i=1 = median { u 1, u 2,..., u n, f + w n + + w 1 p µ, f + sign(w n + + w 2 w 1 ) w n + + w 2 w 1 p µ, f + sign(w n + + w 3 w 2 w 1 ) w n + + w 3 w 2 w 1 p µ,..., f + sign(w n w n 1 w 1 ) w n w n 1 w 1 p µ, f w n + w n w 1 p µ }, (11) where p = 1 α 1 and µ = ( 1 αλ )p. A computational savings is that the p i do not depend on the u i, so with all else fixed they only need to be computed once. Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 28 / 30

29 Numerical Results 5 2 search window(snr 10.4) 7 2 (SNR 10.8) 11 2 (SNR 10.9) The runtimes are 14s with the 5 5 search window and 20s with the 7 7 search window and 34s with search window. (Parameters: λ = , a = 1.25, h = 10.2.) Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 29 / 30

30 Thanks for your attention! Li, Osher (UCLA) Coordinate Descent for L 1 Minimization 30 / 30

Variational approach to restore point-like and curve-like singularities in imaging

Variational approach to restore point-like and curve-like singularities in imaging Variational approach to restore point-like and curve-like singularities in imaging Daniele Graziani joint work with Gilles Aubert and Laure Blanc-Féraud Roma 12/06/2012 Daniele Graziani (Roma) 12/06/2012

More information

Dual Methods for Total Variation-Based Image Restoration

Dual Methods for Total Variation-Based Image Restoration Dual Methods for Total Variation-Based Image Restoration Jamylle Carter Institute for Mathematics and its Applications University of Minnesota, Twin Cities Ph.D. (Mathematics), UCLA, 2001 Advisor: Tony

More information

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

More information

Lecture 6: Logistic Regression

Lecture 6: Logistic Regression Lecture 6: CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 13, 2011 Outline Outline Classification task Data : X = [x 1,..., x m]: a n m matrix of data points in R n. y { 1,

More information

NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing

NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing Alex Cloninger Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu

More information

7 Gaussian Elimination and LU Factorization

7 Gaussian Elimination and LU Factorization 7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method

More information

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard

More information

Mathematical finance and linear programming (optimization)

Mathematical finance and linear programming (optimization) Mathematical finance and linear programming (optimization) Geir Dahl September 15, 2009 1 Introduction The purpose of this short note is to explain how linear programming (LP) (=linear optimization) may

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

v w is orthogonal to both v and w. the three vectors v, w and v w form a right-handed set of vectors.

v w is orthogonal to both v and w. the three vectors v, w and v w form a right-handed set of vectors. 3. Cross product Definition 3.1. Let v and w be two vectors in R 3. The cross product of v and w, denoted v w, is the vector defined as follows: the length of v w is the area of the parallelogram with

More information

General Framework for an Iterative Solution of Ax b. Jacobi s Method

General Framework for an Iterative Solution of Ax b. Jacobi s Method 2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,

More information

Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

More information

SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA)

SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI. (B.Sc.(Hons.), BUAA) SMOOTHING APPROXIMATIONS FOR TWO CLASSES OF CONVEX EIGENVALUE OPTIMIZATION PROBLEMS YU QI (B.Sc.(Hons.), BUAA) A THESIS SUBMITTED FOR THE DEGREE OF MASTER OF SCIENCE DEPARTMENT OF MATHEMATICS NATIONAL

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

How To Solve The Cluster Algorithm

How To Solve The Cluster Algorithm Cluster Algorithms Adriano Cruz adriano@nce.ufrj.br 28 de outubro de 2013 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 1 / 80 Summary 1 K-Means Adriano Cruz adriano@nce.ufrj.br

More information

Linear Programming. March 14, 2014

Linear Programming. March 14, 2014 Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1

More information

Linear Programming Notes V Problem Transformations

Linear Programming Notes V Problem Transformations Linear Programming Notes V Problem Transformations 1 Introduction Any linear programming problem can be rewritten in either of two standard forms. In the first form, the objective is to maximize, the material

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Binary Image Reconstruction

Binary Image Reconstruction A network flow algorithm for reconstructing binary images from discrete X-rays Kees Joost Batenburg Leiden University and CWI, The Netherlands kbatenbu@math.leidenuniv.nl Abstract We present a new algorithm

More information

Vector Math Computer Graphics Scott D. Anderson

Vector Math Computer Graphics Scott D. Anderson Vector Math Computer Graphics Scott D. Anderson 1 Dot Product The notation v w means the dot product or scalar product or inner product of two vectors, v and w. In abstract mathematics, we can talk about

More information

NOTES ON LINEAR TRANSFORMATIONS

NOTES ON LINEAR TRANSFORMATIONS NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all

More information

1 Solving LPs: The Simplex Algorithm of George Dantzig

1 Solving LPs: The Simplex Algorithm of George Dantzig Solving LPs: The Simplex Algorithm of George Dantzig. Simplex Pivoting: Dictionary Format We illustrate a general solution procedure, called the simplex algorithm, by implementing it on a very simple example.

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Perron vector Optimization applied to search engines

Perron vector Optimization applied to search engines Perron vector Optimization applied to search engines Olivier Fercoq INRIA Saclay and CMAP Ecole Polytechnique May 18, 2011 Web page ranking The core of search engines Semantic rankings (keywords) Hyperlink

More information

Introduction to Online Learning Theory

Introduction to Online Learning Theory Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent

More information

Numerical Methods For Image Restoration

Numerical Methods For Image Restoration Numerical Methods For Image Restoration CIRAM Alessandro Lanza University of Bologna, Italy Faculty of Engineering CIRAM Outline 1. Image Restoration as an inverse problem 2. Image degradation models:

More information

Scalar Valued Functions of Several Variables; the Gradient Vector

Scalar Valued Functions of Several Variables; the Gradient Vector Scalar Valued Functions of Several Variables; the Gradient Vector Scalar Valued Functions vector valued function of n variables: Let us consider a scalar (i.e., numerical, rather than y = φ(x = φ(x 1,

More information

2.2 Creaseness operator

2.2 Creaseness operator 2.2. Creaseness operator 31 2.2 Creaseness operator Antonio López, a member of our group, has studied for his PhD dissertation the differential operators described in this section [72]. He has compared

More information

1 Error in Euler s Method

1 Error in Euler s Method 1 Error in Euler s Method Experience with Euler s 1 method raises some interesting questions about numerical approximations for the solutions of differential equations. 1. What determines the amount of

More information

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen (für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

17.3.1 Follow the Perturbed Leader

17.3.1 Follow the Perturbed Leader CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.

More information

3. INNER PRODUCT SPACES

3. INNER PRODUCT SPACES . INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.

More information

Part II Redundant Dictionaries and Pursuit Algorithms

Part II Redundant Dictionaries and Pursuit Algorithms Aisenstadt Chair Course CRM September 2009 Part II Redundant Dictionaries and Pursuit Algorithms Stéphane Mallat Centre de Mathématiques Appliquées Ecole Polytechnique Sparsity in Redundant Dictionaries

More information

Microeconomic Theory: Basic Math Concepts

Microeconomic Theory: Basic Math Concepts Microeconomic Theory: Basic Math Concepts Matt Van Essen University of Alabama Van Essen (U of A) Basic Math Concepts 1 / 66 Basic Math Concepts In this lecture we will review some basic mathematical concepts

More information

Lecture 1: Schur s Unitary Triangularization Theorem

Lecture 1: Schur s Unitary Triangularization Theorem Lecture 1: Schur s Unitary Triangularization Theorem This lecture introduces the notion of unitary equivalence and presents Schur s theorem and some of its consequences It roughly corresponds to Sections

More information

Optimization Modeling for Mining Engineers

Optimization Modeling for Mining Engineers Optimization Modeling for Mining Engineers Alexandra M. Newman Division of Economics and Business Slide 1 Colorado School of Mines Seminar Outline Linear Programming Integer Linear Programming Slide 2

More information

1 Determinants and the Solvability of Linear Systems

1 Determinants and the Solvability of Linear Systems 1 Determinants and the Solvability of Linear Systems In the last section we learned how to use Gaussian elimination to solve linear systems of n equations in n unknowns The section completely side-stepped

More information

A Network Flow Approach in Cloud Computing

A Network Flow Approach in Cloud Computing 1 A Network Flow Approach in Cloud Computing Soheil Feizi, Amy Zhang, Muriel Médard RLE at MIT Abstract In this paper, by using network flow principles, we propose algorithms to address various challenges

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

Høgskolen i Narvik Sivilingeniørutdanningen STE6237 ELEMENTMETODER. Oppgaver

Høgskolen i Narvik Sivilingeniørutdanningen STE6237 ELEMENTMETODER. Oppgaver Høgskolen i Narvik Sivilingeniørutdanningen STE637 ELEMENTMETODER Oppgaver Klasse: 4.ID, 4.IT Ekstern Professor: Gregory A. Chechkin e-mail: chechkin@mech.math.msu.su Narvik 6 PART I Task. Consider two-point

More information

To give it a definition, an implicit function of x and y is simply any relationship that takes the form:

To give it a definition, an implicit function of x and y is simply any relationship that takes the form: 2 Implicit function theorems and applications 21 Implicit functions The implicit function theorem is one of the most useful single tools you ll meet this year After a while, it will be second nature to

More information

1 Norms and Vector Spaces

1 Norms and Vector Spaces 008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)

More information

17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function

17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function 17. Inner product spaces Definition 17.1. Let V be a real vector space. An inner product on V is a function, : V V R, which is symmetric, that is u, v = v, u. bilinear, that is linear (in both factors):

More information

Lecture Topic: Low-Rank Approximations

Lecture Topic: Low-Rank Approximations Lecture Topic: Low-Rank Approximations Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original

More information

Notes on Symmetric Matrices

Notes on Symmetric Matrices CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.

More information

MATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform

MATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform MATH 433/533, Fourier Analysis Section 11, The Discrete Fourier Transform Now, instead of considering functions defined on a continuous domain, like the interval [, 1) or the whole real line R, we wish

More information

Introduction to the Finite Element Method

Introduction to the Finite Element Method Introduction to the Finite Element Method 09.06.2009 Outline Motivation Partial Differential Equations (PDEs) Finite Difference Method (FDM) Finite Element Method (FEM) References Motivation Figure: cross

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1 (R) User Manual Environmental Energy Technologies Division Berkeley, CA 94720 http://simulationresearch.lbl.gov Michael Wetter MWetter@lbl.gov February 20, 2009 Notice: This work was supported by the U.S.

More information

SOLVING LINEAR SYSTEMS

SOLVING LINEAR SYSTEMS SOLVING LINEAR SYSTEMS Linear systems Ax = b occur widely in applied mathematics They occur as direct formulations of real world problems; but more often, they occur as a part of the numerical analysis

More information

Sublinear Algorithms for Big Data. Part 4: Random Topics

Sublinear Algorithms for Big Data. Part 4: Random Topics Sublinear Algorithms for Big Data Part 4: Random Topics Qin Zhang 1-1 2-1 Topic 1: Compressive sensing Compressive sensing The model (Candes-Romberg-Tao 04; Donoho 04) Applicaitons Medical imaging reconstruction

More information

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff Nimble Algorithms for Cloud Computing Ravi Kannan, Santosh Vempala and David Woodruff Cloud computing Data is distributed arbitrarily on many servers Parallel algorithms: time Streaming algorithms: sublinear

More information

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8

Bindel, Spring 2012 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Feb 8 Spaces and bases Week 3: Wednesday, Feb 8 I have two favorite vector spaces 1 : R n and the space P d of polynomials of degree at most d. For R n, we have a canonical basis: R n = span{e 1, e 2,..., e

More information

α = u v. In other words, Orthogonal Projection

α = u v. In other words, Orthogonal Projection Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v

More information

STORM: Stochastic Optimization Using Random Models Katya Scheinberg Lehigh University. (Joint work with R. Chen and M. Menickelly)

STORM: Stochastic Optimization Using Random Models Katya Scheinberg Lehigh University. (Joint work with R. Chen and M. Menickelly) STORM: Stochastic Optimization Using Random Models Katya Scheinberg Lehigh University (Joint work with R. Chen and M. Menickelly) Outline Stochastic optimization problem black box gradient based Existing

More information

Date: April 12, 2001. Contents

Date: April 12, 2001. Contents 2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

More information

CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION

CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION No: CITY UNIVERSITY LONDON BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION ENGINEERING MATHEMATICS 2 (resit) EX2005 Date: August

More information

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system 1. Systems of linear equations We are interested in the solutions to systems of linear equations. A linear equation is of the form 3x 5y + 2z + w = 3. The key thing is that we don t multiply the variables

More information

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0.

x1 x 2 x 3 y 1 y 2 y 3 x 1 y 2 x 2 y 1 0. Cross product 1 Chapter 7 Cross product We are getting ready to study integration in several variables. Until now we have been doing only differential calculus. One outcome of this study will be our ability

More information

Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture.

Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture. Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture. Chirag Gupta,Sumod Mohan K cgupta@clemson.edu, sumodm@clemson.edu Abstract In this project we propose a method to improve

More information

Bilinear Prediction Using Low-Rank Models

Bilinear Prediction Using Low-Rank Models Bilinear Prediction Using Low-Rank Models Inderjit S. Dhillon Dept of Computer Science UT Austin 26th International Conference on Algorithmic Learning Theory Banff, Canada Oct 6, 2015 Joint work with C-J.

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 General Integer Linear Program: (ILP) min c T x Ax b x 0 integer Assumption: A, b integer The integrality condition

More information

1 The Brownian bridge construction

1 The Brownian bridge construction The Brownian bridge construction The Brownian bridge construction is a way to build a Brownian motion path by successively adding finer scale detail. This construction leads to a relatively easy proof

More information

Inner products on R n, and more

Inner products on R n, and more Inner products on R n, and more Peyam Ryan Tabrizian Friday, April 12th, 2013 1 Introduction You might be wondering: Are there inner products on R n that are not the usual dot product x y = x 1 y 1 + +

More information

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}. Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian

More information

Linear Programming. April 12, 2005

Linear Programming. April 12, 2005 Linear Programming April 1, 005 Parts of this were adapted from Chapter 9 of i Introduction to Algorithms (Second Edition) /i by Cormen, Leiserson, Rivest and Stein. 1 What is linear programming? The first

More information

Chapter 4: Artificial Neural Networks

Chapter 4: Artificial Neural Networks Chapter 4: Artificial Neural Networks CS 536: Machine Learning Littman (Wu, TA) Administration icml-03: instructional Conference on Machine Learning http://www.cs.rutgers.edu/~mlittman/courses/ml03/icml03/

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

The Heat Equation. Lectures INF2320 p. 1/88

The Heat Equation. Lectures INF2320 p. 1/88 The Heat Equation Lectures INF232 p. 1/88 Lectures INF232 p. 2/88 The Heat Equation We study the heat equation: u t = u xx for x (,1), t >, (1) u(,t) = u(1,t) = for t >, (2) u(x,) = f(x) for x (,1), (3)

More information

Factorization Theorems

Factorization Theorems Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

More information

Similarity and Diagonalization. Similar Matrices

Similarity and Diagonalization. Similar Matrices MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

More information

Optimization in R n Introduction

Optimization in R n Introduction Optimization in R n Introduction Rudi Pendavingh Eindhoven Technical University Optimization in R n, lecture Rudi Pendavingh (TUE) Optimization in R n Introduction ORN / 4 Some optimization problems designing

More information

3 Contour integrals and Cauchy s Theorem

3 Contour integrals and Cauchy s Theorem 3 ontour integrals and auchy s Theorem 3. Line integrals of complex functions Our goal here will be to discuss integration of complex functions = u + iv, with particular regard to analytic functions. Of

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

Solutions to Math 51 First Exam January 29, 2015

Solutions to Math 51 First Exam January 29, 2015 Solutions to Math 5 First Exam January 29, 25. ( points) (a) Complete the following sentence: A set of vectors {v,..., v k } is defined to be linearly dependent if (2 points) there exist c,... c k R, not

More information

Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach

Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach MASTER S THESIS Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach PAULINE ALDENVIK MIRJAM SCHIERSCHER Department of Mathematical

More information

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Semi-Supervised Support Vector Machines and Application to Spam Filtering Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery

More information

Nonlinear Algebraic Equations Example

Nonlinear Algebraic Equations Example Nonlinear Algebraic Equations Example Continuous Stirred Tank Reactor (CSTR). Look for steady state concentrations & temperature. s r (in) p,i (in) i In: N spieces with concentrations c, heat capacities

More information

Lecture 2: August 29. Linear Programming (part I)

Lecture 2: August 29. Linear Programming (part I) 10-725: Convex Optimization Fall 2013 Lecture 2: August 29 Lecturer: Barnabás Póczos Scribes: Samrachana Adhikari, Mattia Ciollaro, Fabrizio Lecci Note: LaTeX template courtesy of UC Berkeley EECS dept.

More information

AdaBoost. Jiri Matas and Jan Šochman. Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz

AdaBoost. Jiri Matas and Jan Šochman. Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz AdaBoost Jiri Matas and Jan Šochman Centre for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Presentation Outline: AdaBoost algorithm Why is of interest? How it works? Why

More information

Numerical methods for American options

Numerical methods for American options Lecture 9 Numerical methods for American options Lecture Notes by Andrzej Palczewski Computational Finance p. 1 American options The holder of an American option has the right to exercise it at any moment

More information

SOLUTIONS. f x = 6x 2 6xy 24x, f y = 3x 2 6y. To find the critical points, we solve

SOLUTIONS. f x = 6x 2 6xy 24x, f y = 3x 2 6y. To find the critical points, we solve SOLUTIONS Problem. Find the critical points of the function f(x, y = 2x 3 3x 2 y 2x 2 3y 2 and determine their type i.e. local min/local max/saddle point. Are there any global min/max? Partial derivatives

More information

5.1 Bipartite Matching

5.1 Bipartite Matching CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the Ford-Fulkerson

More information

Chapter 20. Vector Spaces and Bases

Chapter 20. Vector Spaces and Bases Chapter 20. Vector Spaces and Bases In this course, we have proceeded step-by-step through low-dimensional Linear Algebra. We have looked at lines, planes, hyperplanes, and have seen that there is no limit

More information

GI01/M055 Supervised Learning Proximal Methods

GI01/M055 Supervised Learning Proximal Methods GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators

More information

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy BMI Paper The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy Faculty of Sciences VU University Amsterdam De Boelelaan 1081 1081 HV Amsterdam Netherlands Author: R.D.R.

More information

Principles of demand management Airline yield management Determining the booking limits. » A simple problem» Stochastic gradients for general problems

Principles of demand management Airline yield management Determining the booking limits. » A simple problem» Stochastic gradients for general problems Demand Management Principles of demand management Airline yield management Determining the booking limits» A simple problem» Stochastic gradients for general problems Principles of demand management Issues:»

More information

MAT 200, Midterm Exam Solution. a. (5 points) Compute the determinant of the matrix A =

MAT 200, Midterm Exam Solution. a. (5 points) Compute the determinant of the matrix A = MAT 200, Midterm Exam Solution. (0 points total) a. (5 points) Compute the determinant of the matrix 2 2 0 A = 0 3 0 3 0 Answer: det A = 3. The most efficient way is to develop the determinant along the

More information

Systems of Linear Equations

Systems of Linear Equations Systems of Linear Equations Beifang Chen Systems of linear equations Linear systems A linear equation in variables x, x,, x n is an equation of the form a x + a x + + a n x n = b, where a, a,, a n and

More information

A central problem in network revenue management

A central problem in network revenue management A Randomized Linear Programming Method for Computing Network Bid Prices KALYAN TALLURI Universitat Pompeu Fabra, Barcelona, Spain GARRETT VAN RYZIN Columbia University, New York, New York We analyze a

More information

Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain

Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain Lectures notes on orthogonal matrices (with exercises) 92.222 - Linear Algebra II - Spring 2004 by D. Klain 1. Orthogonal matrices and orthonormal sets An n n real-valued matrix A is said to be an orthogonal

More information

MATH1231 Algebra, 2015 Chapter 7: Linear maps

MATH1231 Algebra, 2015 Chapter 7: Linear maps MATH1231 Algebra, 2015 Chapter 7: Linear maps A/Prof. Daniel Chan School of Mathematics and Statistics University of New South Wales danielc@unsw.edu.au Daniel Chan (UNSW) MATH1231 Algebra 1 / 43 Chapter

More information

Parallel Selective Algorithms for Nonconvex Big Data Optimization

Parallel Selective Algorithms for Nonconvex Big Data Optimization 1874 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 63, NO. 7, APRIL 1, 2015 Parallel Selective Algorithms for Nonconvex Big Data Optimization Francisco Facchinei, Gesualdo Scutari, Senior Member, IEEE,

More information

Multi-variable Calculus and Optimization

Multi-variable Calculus and Optimization Multi-variable Calculus and Optimization Dudley Cooke Trinity College Dublin Dudley Cooke (Trinity College Dublin) Multi-variable Calculus and Optimization 1 / 51 EC2040 Topic 3 - Multi-variable Calculus

More information

Principles of Scientific Computing Nonlinear Equations and Optimization

Principles of Scientific Computing Nonlinear Equations and Optimization Principles of Scientific Computing Nonlinear Equations and Optimization David Bindel and Jonathan Goodman last revised March 6, 2006, printed March 6, 2009 1 1 Introduction This chapter discusses two related

More information

Applications to Data Smoothing and Image Processing I

Applications to Data Smoothing and Image Processing I Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is

More information

A linear algebraic method for pricing temporary life annuities

A linear algebraic method for pricing temporary life annuities A linear algebraic method for pricing temporary life annuities P. Date (joint work with R. Mamon, L. Jalen and I.C. Wang) Department of Mathematical Sciences, Brunel University, London Outline Introduction

More information