BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION"

Transcription

1 BIG DATA PROBLEMS AND LARGE-SCALE OPTIMIZATION: A DISTRIBUTED ALGORITHM FOR MATRIX FACTORIZATION Ş. İlker Birbil Sabancı University Ali Taylan Cemgil 1, Hazal Koptagel 1, Figen Öztoprak 2, Umut Şimşekli 1 1: Boğaziçi University, 2: Bilgi University Nottingham University March, 2015 Ş. İlker Birbil (Sabancı University) Big Data Optimization 1 / 22

2 LARGE-SCALE OPTIMIZATION AND MACHINE LEARNING Introduction Exploiting the Structure Need for Parallel Algorithms F. Öztoprak Ş. İlker Birbil (Sabancı University) Big Data Optimization 2 / 22

3 DATA SCIENCE Ş. İlker Birbil (Sabancı University) Big Data Optimization 3 / 22

4 GRADUATE COURSES Ş. İlker Birbil (Sabancı University) Big Data Optimization 4 / 22

5 NONLINEAR OPTIMIZATION Introduction Exploiting the Structure Need for Parallel Algorithms Typically, Nonlinear a nonlinear Programming optimization problem (NLP) isproblem defined as minimize f (x) x R n Covers optimization problems subject to c i(x) = 0, i E, min c f(x) x2x i(x) 0, i I, where where f : R n X = {x 2 R R is the n : g(x) apple 0}, the functions g objective function and c i : R n : R n! R R for m, f : R i E n! R are I are the continuous and not necessarily linear. constraint functions. At least one of these functions is nonlinear. (1) x* Ş. İlker Birbil (Sabancı University) Big Data Optimization 5 / 22

6 ROLE OF NONLINEAR OPTIMIZATION Introduction Exploiting the Structure Need for Parallel Algorithms Molecular Biology (Protein Folding) Engineering Design (Machining) Global Optimization Finance (Risk Management) Derivative Free Optimization Nonlinear Stochastic Prog. Statistics Large Scale Core NLP Computer Science Applied Mathematics Convex Optimization Mixed Integer NLP Operations Research Machine Learning (Image Recovery) PDE Constrained Optimization Production (Chemical Complex Design) Health (Cancer Treatment) F. Öztoprak Ş. İlker Birbil (Sabancı University) Big Data Optimization 6 / 22

7 OUR RESEARCH GROUP Three faculty members, four PhD students, three MSc students (Coupled) Tensor or matrix factorization Distributed and parallel algorithms: Bayesian inference Nonlinear optimization Processor 1 Core 1 Core 2 Core 3 Core 4 Memory 1 Processor 2 Core 1 Core 2 Core 3 Core 4 Memory 2 Processor 3 Core 1 Core 2 Core 3 Core 4 Memory 3 Ş. İlker Birbil (Sabancı University) Big Data Optimization 7 / 22

8 OUR RESEARCH GROUP Three faculty members, four PhD students, three MSc students (Coupled) Tensor or matrix factorization Distributed and parallel algorithms: Bayesian inference Nonlinear optimization Processor 1 Core 1 Core 2 Core 3 Core 4 Memory 1 Processor 2 Core 1 Core 2 Core 3 Core 4 Memory 2 Processor 3 Core 1 Core 2 Core 3 Core 4 Memory 3 Ş. İlker Birbil (Sabancı University) Big Data Optimization 7 / 22

9 LINK PREDICTION VIA TENSOR FACTORIZATION X 1(i, j, k): if user i visits location j and performs activity k X 2(i, m): frequency of a user i visiting location m X j(j, n): points of interest for a location j Ş. İlker Birbil (Sabancı University) Big Data Optimization 8 / 22

10 TENSOR FACTORIZATION Matrix & Tensor Factorizations Tensor Factorization Tensor Factorization Tensor Multidimensional Array (X i,j,k,...) Extension of matrix factorizations to higher-order tensors Tensor factorizations are used to extract the underlying factors in higher-order data I Tensor Multidimensional Array I Used toi extract the underlying factors in higher-order data sets sets Tensor Factorisation + 7/1 X (i, j, k) X (i, r)z 2(j, r)z 3(k, r) X(i, j, k) r (i, r)z 2(j, r)z 3(k, r) r Cemgil Probabilistic Latent Tensor Factorisation. IFG19SabanciUniversity 14 Ş. İlker Birbil (Sabancı University) Big Data Optimization 9 / 22

11 X X 12 Z 2 MATRIX FACTORIZATION X (, ) X X(, ) i Z(, 1 ( i)z,i)z 2 (i, 2 ) (i, ) An inverse problem: Estimate i and Z 2 given data matrix X assuming X Z 2 X M "! ˆX Z 2 #! " #! " able error Overall function optimization subject problem to constraints (e.g., nonnegativity, ble error function subject to constraints (e.g., nonnegativity, minimize X Z 2 2 F subject to, Z 2 Z, (, Z 2 ) =argmind(x Z 2 )+ R(, Z 2 ),Z 2 where Z is the feasible region. When Z is the first orthant, we have the 1,Z 2 nonnegative ) = arg matrixmin factorization D(X Z problem. 1 Z 2 )+ R(,Z 2 ),Z 2 Ş. İlker Birbil (Sabancı University) Big Data Optimization 10 / 22

12 MOVIE RECOMMENDATION minimize X Z 2 2 F subject to 0, Z 2 0 Ş. İlker Birbil (Sabancı University) Big Data Optimization 11 / 22

13 DISTRIBUTED IMPLEMENTATION Time Slot 1: Perform X 12 (1,:) X 12 = (1,:)Z 2 (:,2) on P1 X 31 Time Slot 2: X 23 (2,:) (3,:) x Z 2 (:,1) Z 2 (:,2) Z 2 (:,3) X 23 = (2,:)Z 2 (:,3) on P2 X 31 = (3,:)Z 2 (:,1) on P3 by employing IPA. X 11 (1,:) X 22 (2,:) x Z 2 (:,1) Z 2 (:,2) Z 2 (:,3) X 33 (3,:) Time Slot 3: X 13 (1,:) X 21 (2,:) x Z 2 (:,1) Z 2 (:,2) Z 2 (:,3) X 32 (3,:) Time Slot 4:... Ş. İlker Birbil (Sabancı University) Big Data Optimization 12 / 22

14 REFORMULATION 1" minimize subject to X Z 2 2 F, Z 2 Z 1" 2" 3" Z 2 4" 5" 6" z."."." 6" GENERIC PROBLEM minimize f i(z) subject to i {1,,m} z ζ Ş. İlker Birbil (Sabancı University) Big Data Optimization 13 / 22

15 DISTRUBUTED OPTIMIZATION Time Slot 1: X 31 Time Slot 2: X 11 Time Slot 3: X 21 Time Slot 4:... X 12 X 22 X 32 X 23 X 33 X 13 (1,:) (2,:) (3,:) (1,:) (2,:) (3,:) (1,:) (2,:) (3,:) x x x Z 2 (:,1) Z 2 (:,2) Z 2 (:,3) Z 2 (:,1) Z 2 (:,2) Z 2 (:,3) Z 2 (:,1) Z 2 (:,2) Z 2 (:,3) Perform X 12 = (1,:)Z 2 (:,2) on P1 X 23 = (2,:)Z 2 (:,3) on P2 X 31 = (3,:)Z 2 (:,1) on P3 by employing IPA. " 2" 3" 1 Z 2 4" 5" 6" z 1"."."." 6" minimize subject to i {1,,m} z ζ f i(z) At each time slot k, we solve a subset S k of the component functions f i, i {1, 2,, m} We make sure that each data block is visited after c passes (c = 3 in the figure) Ş. İlker Birbil (Sabancı University) Big Data Optimization 14 / 22

16 INCREMENTAL QUASI-NEWTON ALGORITHM Unlike gradient-based methods, the proposed algorithm uses second order information through Hessian approximation (L-BFGS quasi-newton method) The proposed algorithm visits each subset of component functions in the same order (incremental and deterministic) We do not assume convexity of the function (matrix factorization can be solved) CORE STEP Solve a quadratic approximation of the (partial) objective function: Q t k(z) = (z z k) Sk f (z k) (z zk) H t(z z k) βt z zk 2. Ş. İlker Birbil (Sabancı University) Big Data Optimization 15 / 22

17 INCREMENTAL QUASI-NEWTON ALGORITHM (CONT D) Q t k(z) = (z z k) Sk f (z k) (z zk) H t(z z k) βt z zk 2. Algorithm 1: HAMSI input: y 0,β 1 1 for t = 0, 1, 2, do 2 z 1 = y t 3 Compute H t 4 for k = 1, 2,, c do 5 Choose a subset S k {1,, m} 6 Compute Sk f (z k) 7 z k+1 = arg min z ζ Q t k(z) 8 end 9 y t+1 = z c+1 10 Set β t+1 β t 11 end Ş. İlker Birbil (Sabancı University) Big Data Optimization 16 / 22

18 CONVERGENCE ANALYSIS (ζ = R n ) ASSUMPTIONS 1. Hessians of the component functions and (H t + β ti) are uniformly bounded: i S k 2 i f (y t) L t L S k, y t. 2. The smallest eigenvalue of (H t + β ti) is bounded away from zero: U t (H t + β ti) 1 M t t. 3. The gradient norms are uniformly bounded: Sk f (y t) C S k, y t. Ş. İlker Birbil (Sabancı University) Big Data Optimization 17 / 22

19 CONVERGENCE ANALYSIS (CONT D) LEMMA At each outer iteration t of Algorithm 1 and for k = 1,, c, we have k 1 δ k = Sk f (z k) Sk f (y t) L tm t (1 + L tm t) k 1 j Sj f (y t) j=1 THEOREM Consider the iterates y t produced by Algorithm 1. Then, all accumulation points of {y t} are stationary points of the generic problem. Ş. İlker Birbil (Sabancı University) Big Data Optimization 18 / 22

20 CONVERGENCE ANALYSIS (CONT D) LEMMA At each outer iteration t of Algorithm 1 and for k = 1,, c, we have k 1 δ k = Sk f (z k) Sk f (y t) L tm t (1 + L tm t) k 1 j Sj f (y t) j=1 THEOREM Consider the iterates y t produced by Algorithm 1. Then, all accumulation points of {y t} are stationary points of the generic problem. COROLLARY Algorithm 1 solves the matrix factorization problem. Ş. İlker Birbil (Sabancı University) Big Data Optimization 18 / 22

21 PRELIMINARY EXPERIMENTS - SETUP Linux cluster with 15 nodes Each node has 8, Intel Xeon 2.50 GHz processor with 16 GB RAM This setting allows execution of 120 parallel tasks in parallel MovieLens data (1M) is used for our preliminary experiments Ş. İlker Birbil (Sabancı University) Big Data Optimization 19 / 22

22 PRELIMINARY EXPERIMENTS FIGURE: Objective function values Ş. İlker Birbil (Sabancı University) Big Data Optimization 20 / 22

23 PRELIMINARY EXPERIMENTS (CONT D) FIGURE: Root mean square error Ş. İlker Birbil (Sabancı University) Big Data Optimization 21 / 22

24 CONCLUDING REMARKS Ş. İlker Birbil (Sabancı University) Big Data Optimization 22 / 22

25 CONCLUDING REMARKS SUMMARY A promising research path at the intersection of operations research and computer science Ş. İlker Birbil (Sabancı University) Big Data Optimization 22 / 22

26 CONCLUDING REMARKS SUMMARY A promising research path at the intersection of operations research and computer science A new distributed and parallel implementation for matrix factorization Ş. İlker Birbil (Sabancı University) Big Data Optimization 22 / 22

27 CONCLUDING REMARKS SUMMARY A promising research path at the intersection of operations research and computer science A new distributed and parallel implementation for matrix factorization A generic analysis that could be used for showing convergence of other algorithms Ş. İlker Birbil (Sabancı University) Big Data Optimization 22 / 22

28 CONCLUDING REMARKS SUMMARY A promising research path at the intersection of operations research and computer science A new distributed and parallel implementation for matrix factorization A generic analysis that could be used for showing convergence of other algorithms FUTURE RESEARCHJ Extensive computational study Ş. İlker Birbil (Sabancı University) Big Data Optimization 22 / 22

29 CONCLUDING REMARKS SUMMARY A promising research path at the intersection of operations research and computer science A new distributed and parallel implementation for matrix factorization A generic analysis that could be used for showing convergence of other algorithms FUTURE RESEARCHJ Extensive computational study Stochastic version of the proposed algorithm Ş. İlker Birbil (Sabancı University) Big Data Optimization 22 / 22

30 CONCLUDING REMARKS SUMMARY A promising research path at the intersection of operations research and computer science A new distributed and parallel implementation for matrix factorization A generic analysis that could be used for showing convergence of other algorithms FUTURE RESEARCHJ Extensive computational study Stochastic version of the proposed algorithm Quasi-Newton-based Bayesian inference Ş. İlker Birbil (Sabancı University) Big Data Optimization 22 / 22

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen (für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained

More information

Chapter 4 Sequential Quadratic Programming

Chapter 4 Sequential Quadratic Programming Optimization I; Chapter 4 77 Chapter 4 Sequential Quadratic Programming 4.1 The Basic SQP Method 4.1.1 Introductory Definitions and Assumptions Sequential Quadratic Programming (SQP) is one of the most

More information

Parameter Estimation: A Deterministic Approach using the Levenburg-Marquardt Algorithm

Parameter Estimation: A Deterministic Approach using the Levenburg-Marquardt Algorithm Parameter Estimation: A Deterministic Approach using the Levenburg-Marquardt Algorithm John Bardsley Department of Mathematical Sciences University of Montana Applied Math Seminar-Feb. 2005 p.1/14 Outline

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Computing a Nearest Correlation Matrix with Factor Structure

Computing a Nearest Correlation Matrix with Factor Structure Computing a Nearest Correlation Matrix with Factor Structure Nick Higham School of Mathematics The University of Manchester higham@ma.man.ac.uk http://www.ma.man.ac.uk/~higham/ Joint work with Rüdiger

More information

Optimal Scheduling for Dependent Details Processing Using MS Excel Solver

Optimal Scheduling for Dependent Details Processing Using MS Excel Solver BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 8, No 2 Sofia 2008 Optimal Scheduling for Dependent Details Processing Using MS Excel Solver Daniela Borissova Institute of

More information

Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows

Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows TECHNISCHE UNIVERSITEIT EINDHOVEN Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows Lloyd A. Fasting May 2014 Supervisors: dr. M. Firat dr.ir. M.A.A. Boon J. van Twist MSc. Contents

More information

10. Proximal point method

10. Proximal point method L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing

More information

Solutions of Equations in One Variable. Fixed-Point Iteration II

Solutions of Equations in One Variable. Fixed-Point Iteration II Solutions of Equations in One Variable Fixed-Point Iteration II Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011

More information

Nonlinear Optimization: Algorithms 3: Interior-point methods

Nonlinear Optimization: Algorithms 3: Interior-point methods Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

be a nested sequence of closed nonempty connected subsets of a compact metric space X. Prove that

be a nested sequence of closed nonempty connected subsets of a compact metric space X. Prove that Problem 1A. Let... X 2 X 1 be a nested sequence of closed nonempty connected subsets of a compact metric space X. Prove that i=1 X i is nonempty and connected. Since X i is closed in X, it is compact.

More information

Exam in SF1811/SF1831/SF1841 Optimization. Monday June 11, 2012, time:

Exam in SF1811/SF1831/SF1841 Optimization. Monday June 11, 2012, time: Examiner: Per Enqvist, tel. 790 6 98 Exam in SF8/SF8/SF8 Optimization. Monday June, 0, time:.00 9.00 Allowed utensils: Pen, paper, eraser and ruler. No calculator! A formula-sheet is handed out. Solution

More information

AM 221: Advanced Optimization Spring Prof. Yaron Singer Lecture 8 February 24th, 2014

AM 221: Advanced Optimization Spring Prof. Yaron Singer Lecture 8 February 24th, 2014 AM 221: Advanced Optimization Spring 2014 Prof. Yaron Singer Lecture 8 February 24th, 2014 1 Overview Last week we talked about the Simplex algorithm. Today we ll broaden the scope of the objectives we

More information

Introduction to Convex Optimization for Machine Learning

Introduction to Convex Optimization for Machine Learning Introduction to Convex Optimization for Machine Learning John Duchi University of California, Berkeley Practical Machine Learning, Fall 2009 Duchi (UC Berkeley) Convex Optimization for Machine Learning

More information

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d). 1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

More information

Absolute Value Programming

Absolute Value Programming Computational Optimization and Aplications,, 1 11 (2006) c 2006 Springer Verlag, Boston. Manufactured in The Netherlands. Absolute Value Programming O. L. MANGASARIAN olvi@cs.wisc.edu Computer Sciences

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University

More information

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing

More information

Exact shape-reconstruction by one-step linearization in electrical impedance tomography

Exact shape-reconstruction by one-step linearization in electrical impedance tomography Exact shape-reconstruction by one-step linearization in electrical impedance tomography Bastian von Harrach harrach@math.uni-mainz.de Institut für Mathematik, Joh. Gutenberg-Universität Mainz, Germany

More information

A characterization of trace zero symmetric nonnegative 5x5 matrices

A characterization of trace zero symmetric nonnegative 5x5 matrices A characterization of trace zero symmetric nonnegative 5x5 matrices Oren Spector June 1, 009 Abstract The problem of determining necessary and sufficient conditions for a set of real numbers to be the

More information

Advanced Topics in Machine Learning (Part II)

Advanced Topics in Machine Learning (Part II) Advanced Topics in Machine Learning (Part II) 3. Convexity and Optimisation February 6, 2009 Andreas Argyriou 1 Today s Plan Convex sets and functions Types of convex programs Algorithms Convex learning

More information

Big Data Techniques Applied to Very Short-term Wind Power Forecasting

Big Data Techniques Applied to Very Short-term Wind Power Forecasting Big Data Techniques Applied to Very Short-term Wind Power Forecasting Ricardo Bessa Senior Researcher (ricardo.j.bessa@inesctec.pt) Center for Power and Energy Systems, INESC TEC, Portugal Joint work with

More information

Introduction and message of the book

Introduction and message of the book 1 Introduction and message of the book 1.1 Why polynomial optimization? Consider the global optimization problem: P : for some feasible set f := inf x { f(x) : x K } (1.1) K := { x R n : g j (x) 0, j =

More information

Date: April 12, 2001. Contents

Date: April 12, 2001. Contents 2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

ME128 Computer-Aided Mechanical Design Course Notes Introduction to Design Optimization

ME128 Computer-Aided Mechanical Design Course Notes Introduction to Design Optimization ME128 Computer-ided Mechanical Design Course Notes Introduction to Design Optimization 2. OPTIMIZTION Design optimization is rooted as a basic problem for design engineers. It is, of course, a rare situation

More information

Introduction to Algebraic Geometry. Bézout s Theorem and Inflection Points

Introduction to Algebraic Geometry. Bézout s Theorem and Inflection Points Introduction to Algebraic Geometry Bézout s Theorem and Inflection Points 1. The resultant. Let K be a field. Then the polynomial ring K[x] is a unique factorisation domain (UFD). Another example of a

More information

Multi-Objective Optimization

Multi-Objective Optimization Multi-Objective Optimization A quick introduction Giuseppe Narzisi Courant Institute of Mathematical Sciences New York University 24 January 2008 Outline 1 Introduction Motivations Definition Notion of

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method Lecture 3 3B1B Optimization Michaelmas 2015 A. Zisserman Linear Programming Extreme solutions Simplex method Interior point method Integer programming and relaxation The Optimization Tree Linear Programming

More information

17.3.1 Follow the Perturbed Leader

17.3.1 Follow the Perturbed Leader CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.

More information

Optimization of the HOTS score of a website s pages

Optimization of the HOTS score of a website s pages Optimization of the HOTS score of a website s pages Olivier Fercoq and Stéphane Gaubert INRIA Saclay and CMAP Ecole Polytechnique June 21st, 2012 Toy example with 21 pages 3 5 6 2 4 20 1 12 9 11 15 16

More information

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

More information

Zeros of Polynomial Functions

Zeros of Polynomial Functions Zeros of Polynomial Functions The Rational Zero Theorem If f (x) = a n x n + a n-1 x n-1 + + a 1 x + a 0 has integer coefficients and p/q (where p/q is reduced) is a rational zero, then p is a factor of

More information

Parallel Computing for Option Pricing Based on the Backward Stochastic Differential Equation

Parallel Computing for Option Pricing Based on the Backward Stochastic Differential Equation Parallel Computing for Option Pricing Based on the Backward Stochastic Differential Equation Ying Peng, Bin Gong, Hui Liu, and Yanxin Zhang School of Computer Science and Technology, Shandong University,

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Regression Using Support Vector Machines: Basic Foundations

Regression Using Support Vector Machines: Basic Foundations Regression Using Support Vector Machines: Basic Foundations Technical Report December 2004 Aly Farag and Refaat M Mohamed Computer Vision and Image Processing Laboratory Electrical and Computer Engineering

More information

MATHEMATICAL BACKGROUND

MATHEMATICAL BACKGROUND Chapter 1 MATHEMATICAL BACKGROUND This chapter discusses the mathematics that is necessary for the development of the theory of linear programming. We are particularly interested in the solutions of a

More information

Pricing and calibration in local volatility models via fast quantization

Pricing and calibration in local volatility models via fast quantization Pricing and calibration in local volatility models via fast quantization Parma, 29 th January 2015. Joint work with Giorgia Callegaro and Martino Grasselli Quantization: a brief history Birth: back to

More information

Approximation Algorithms: LP Relaxation, Rounding, and Randomized Rounding Techniques. My T. Thai

Approximation Algorithms: LP Relaxation, Rounding, and Randomized Rounding Techniques. My T. Thai Approximation Algorithms: LP Relaxation, Rounding, and Randomized Rounding Techniques My T. Thai 1 Overview An overview of LP relaxation and rounding method is as follows: 1. Formulate an optimization

More information

Cyber-Security Analysis of State Estimators in Power Systems

Cyber-Security Analysis of State Estimators in Power Systems Cyber-Security Analysis of State Estimators in Electric Power Systems André Teixeira 1, Saurabh Amin 2, Henrik Sandberg 1, Karl H. Johansson 1, and Shankar Sastry 2 ACCESS Linnaeus Centre, KTH-Royal Institute

More information

2.2 Creaseness operator

2.2 Creaseness operator 2.2. Creaseness operator 31 2.2 Creaseness operator Antonio López, a member of our group, has studied for his PhD dissertation the differential operators described in this section [72]. He has compared

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

Department of Industrial Engineering

Department of Industrial Engineering Department of Industrial Engineering Master of Engineering Program in Industrial Engineering (International Program) M.Eng. (Industrial Engineering) Plan A Option 2: Total credits required: minimum 39

More information

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

More information

v 1. v n R n we have for each 1 j n that v j v n max 1 j n v j. i=1

v 1. v n R n we have for each 1 j n that v j v n max 1 j n v j. i=1 1. Limits and Continuity It is often the case that a non-linear function of n-variables x = (x 1,..., x n ) is not really defined on all of R n. For instance f(x 1, x 2 ) = x 1x 2 is not defined when x

More information

LINEAR PROGRAMMING WITH ONLINE LEARNING

LINEAR PROGRAMMING WITH ONLINE LEARNING LINEAR PROGRAMMING WITH ONLINE LEARNING TATSIANA LEVINA, YURI LEVIN, JEFF MCGILL, AND MIKHAIL NEDIAK SCHOOL OF BUSINESS, QUEEN S UNIVERSITY, 143 UNION ST., KINGSTON, ON, K7L 3N6, CANADA E-MAIL:{TLEVIN,YLEVIN,JMCGILL,MNEDIAK}@BUSINESS.QUEENSU.CA

More information

MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

More information

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725 Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T

More information

Introduction to Online Learning Theory

Introduction to Online Learning Theory Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent

More information

Simulation-based optimization methods for urban transportation problems. Carolina Osorio

Simulation-based optimization methods for urban transportation problems. Carolina Osorio Simulation-based optimization methods for urban transportation problems Carolina Osorio Civil and Environmental Engineering Department Massachusetts Institute of Technology (MIT) Joint work with: Prof.

More information

Efficient Curve Fitting Techniques

Efficient Curve Fitting Techniques 15/11/11 Life Conference and Exhibition 11 Stuart Carroll, Christopher Hursey Efficient Curve Fitting Techniques - November 1 The Actuarial Profession www.actuaries.org.uk Agenda Background Outline of

More information

Piecewise Linear Relaxation Techniques for Solution of Nonconvex. Nonlinear Programming Problems

Piecewise Linear Relaxation Techniques for Solution of Nonconvex. Nonlinear Programming Problems Piecewise Linear Relaxation Techniques for Solution of Nonconvex Nonlinear Programming Problems Pradeep K. Polisetty and Edward P. Gatzke Department of Chemical Engineering University of South Carolina

More information

Prime Numbers and Irreducible Polynomials

Prime Numbers and Irreducible Polynomials Prime Numbers and Irreducible Polynomials M. Ram Murty The similarity between prime numbers and irreducible polynomials has been a dominant theme in the development of number theory and algebraic geometry.

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

5.1 Bipartite Matching

5.1 Bipartite Matching CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the Ford-Fulkerson

More information

Lecture 8 February 4

Lecture 8 February 4 ICS273A: Machine Learning Winter 2008 Lecture 8 February 4 Scribe: Carlos Agell (Student) Lecturer: Deva Ramanan 8.1 Neural Nets 8.1.1 Logistic Regression Recall the logistic function: g(x) = 1 1 + e θt

More information

Big Data Science. Prof. Lise Getoor University of Maryland, College Park. http://www.cs.umd.edu/~getoor. October 17, 2013

Big Data Science. Prof. Lise Getoor University of Maryland, College Park. http://www.cs.umd.edu/~getoor. October 17, 2013 Big Data Science Prof Lise Getoor University of Maryland, College Park October 17, 2013 http://wwwcsumdedu/~getoor BIG Data is not flat 2004-2013 lonnitaylor Data is multi-modal, multi-relational, spatio-temporal,

More information

Estimating the Inverse Covariance Matrix of Independent Multivariate Normally Distributed Random Variables

Estimating the Inverse Covariance Matrix of Independent Multivariate Normally Distributed Random Variables Estimating the Inverse Covariance Matrix of Independent Multivariate Normally Distributed Random Variables Dominique Brunet, Hanne Kekkonen, Vitor Nunes, Iryna Sivak Fields-MITACS Thematic Program on Inverse

More information

Convex Programming Tools for Disjunctive Programs

Convex Programming Tools for Disjunctive Programs Convex Programming Tools for Disjunctive Programs João Soares, Departamento de Matemática, Universidade de Coimbra, Portugal Abstract A Disjunctive Program (DP) is a mathematical program whose feasible

More information

Gauss-Markov Theorem. The Gauss-Markov Theorem is given in the following regression model and assumptions:

Gauss-Markov Theorem. The Gauss-Markov Theorem is given in the following regression model and assumptions: Gauss-Markov Theorem The Gauss-Markov Theorem is given in the following regression model and assumptions: The regression model y i = β 1 + β x i + u i, i = 1,, n (1) Assumptions (A) or Assumptions (B):

More information

Regularization and Normal Solutions of Systems of Linear Equations and Inequalities

Regularization and Normal Solutions of Systems of Linear Equations and Inequalities ISSN 0081-5438, Proceedings of the Steklov Institute of Mathematics, 2015, Vol. 289, Suppl. 1, pp. S102 S110. c Pleiades Publishing, Ltd., 2015. Original Russian Text c A.I. Golikov, Yu.G. Evtushenko,

More information

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1 (R) User Manual Environmental Energy Technologies Division Berkeley, CA 94720 http://simulationresearch.lbl.gov Michael Wetter MWetter@lbl.gov February 20, 2009 Notice: This work was supported by the U.S.

More information

A branch-and-bound algorithm for convex multi-objective Mixed Integer Non-Linear Programming Problems

A branch-and-bound algorithm for convex multi-objective Mixed Integer Non-Linear Programming Problems A branch-and-bound algorithm for convex multi-objective Mixed Integer Non-Linear Programming Problems Valentina Cacchiani 1 Claudia D Ambrosio 2 1 University of Bologna, Italy 2 École Polytechnique, France

More information

Optimization Modeling for Mining Engineers

Optimization Modeling for Mining Engineers Optimization Modeling for Mining Engineers Alexandra M. Newman Division of Economics and Business Slide 1 Colorado School of Mines Seminar Outline Linear Programming Integer Linear Programming Slide 2

More information

LAGRANGIAN RELAXATION TECHNIQUES FOR LARGE SCALE OPTIMIZATION

LAGRANGIAN RELAXATION TECHNIQUES FOR LARGE SCALE OPTIMIZATION LAGRANGIAN RELAXATION TECHNIQUES FOR LARGE SCALE OPTIMIZATION Kartik Sivaramakrishnan Department of Mathematics NC State University kksivara@ncsu.edu http://www4.ncsu.edu/ kksivara SIAM/MGSA Brown Bag

More information

NONLINEAR AND DYNAMIC OPTIMIZATION From Theory to Practice

NONLINEAR AND DYNAMIC OPTIMIZATION From Theory to Practice NONLINEAR AND DYNAMIC OPTIMIZATION From Theory to Practice IC-32: Winter Semester 2006/2007 Benoît C. CHACHUAT Laboratoire d Automatique, École Polytechnique Fédérale de Lausanne CONTENTS 1 Nonlinear

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

2.5 Gaussian Elimination

2.5 Gaussian Elimination page 150 150 CHAPTER 2 Matrices and Systems of Linear Equations 37 10 the linear algebra package of Maple, the three elementary 20 23 1 row operations are 12 1 swaprow(a,i,j): permute rows i and j 3 3

More information

Contents. Introduction and Notes pages 2-3 (These are important and it s only 2 pages ~ please take the time to read them!)

Contents. Introduction and Notes pages 2-3 (These are important and it s only 2 pages ~ please take the time to read them!) Page Contents Introduction and Notes pages 2-3 (These are important and it s only 2 pages ~ please take the time to read them!) Systematic Search for a Change of Sign (Decimal Search) Method Explanation

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Branch and Bound Methods

Branch and Bound Methods Branch and Bound Methods basic ideas and attributes unconstrained nonconvex optimization mixed convex-boolean optimization Prof. S. Boyd, EE364b, Stanford University Methods for nonconvex optimization

More information

Proximal mapping via network optimization

Proximal mapping via network optimization L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:

More information

The Dirichlet Unit Theorem

The Dirichlet Unit Theorem Chapter 6 The Dirichlet Unit Theorem As usual, we will be working in the ring B of algebraic integers of a number field L. Two factorizations of an element of B are regarded as essentially the same if

More information

Gradient Methods. Rafael E. Banchs

Gradient Methods. Rafael E. Banchs Gradient Methods Rafael E. Banchs INTRODUCTION This report discuss one class of the local search algorithms to be used in the inverse modeling of the time harmonic field electric logging problem, the Gradient

More information

DELFT UNIVERSITY OF TECHNOLOGY

DELFT UNIVERSITY OF TECHNOLOGY DELFT UNIVERSITY OF TECHNOLOGY REPORT 12-06 ADJOINT SENSITIVITY IN ELECTRICAL IMPEDANCE TOMOGRAPHY USING COMSOL MULTIPHYSICS W. MULCKHUYSE, D. LAHAYE, A. BELITSKAYA ISSN 1389-6520 Reports of the Department

More information

SOME RESULTS ON THE DRAZIN INVERSE OF A MODIFIED MATRIX WITH NEW CONDITIONS

SOME RESULTS ON THE DRAZIN INVERSE OF A MODIFIED MATRIX WITH NEW CONDITIONS International Journal of Analysis and Applications ISSN 2291-8639 Volume 5, Number 2 (2014, 191-197 http://www.etamaths.com SOME RESULTS ON THE DRAZIN INVERSE OF A MODIFIED MATRIX WITH NEW CONDITIONS ABDUL

More information

Notes on Symmetric Matrices

Notes on Symmetric Matrices CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.

More information

High Performance Computing for Operation Research

High Performance Computing for Operation Research High Performance Computing for Operation Research IEF - Paris Sud University claude.tadonki@u-psud.fr INRIA-Alchemy seminar, Thursday March 17 Research topics Fundamental Aspects of Algorithms and Complexity

More information

Introduction to Linear Programming.

Introduction to Linear Programming. Chapter 1 Introduction to Linear Programming. This chapter introduces notations, terminologies and formulations of linear programming. Examples will be given to show how real-life problems can be modeled

More information

Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization

Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization Archis Ghate a and Robert L. Smith b a Industrial Engineering, University of Washington, Box 352650, Seattle, Washington,

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

Parallel Selective Algorithms for Nonconvex Big Data Optimization

Parallel Selective Algorithms for Nonconvex Big Data Optimization 1874 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 63, NO. 7, APRIL 1, 2015 Parallel Selective Algorithms for Nonconvex Big Data Optimization Francisco Facchinei, Gesualdo Scutari, Senior Member, IEEE,

More information

Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions

Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions Big Data Optimization: Randomized lock-free methods for minimizing partially separable convex functions Peter Richtárik School of Mathematics The University of Edinburgh Joint work with Martin Takáč (Edinburgh)

More information

Optimization of Supply Chain Networks

Optimization of Supply Chain Networks Optimization of Supply Chain Networks M. Herty TU Kaiserslautern September 2006 (2006) 1 / 41 Contents 1 Supply Chain Modeling 2 Networks 3 Optimization Continuous optimal control problem Discrete optimal

More information

Fixed Point Theorems

Fixed Point Theorems Fixed Point Theorems Definition: Let X be a set and let T : X X be a function that maps X into itself. (Such a function is often called an operator, a transformation, or a transform on X, and the notation

More information

ELEC-E8104 Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems

ELEC-E8104 Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems Minimum Mean Square Error (MMSE) MMSE estimation of Gaussian random vectors Linear MMSE estimator for arbitrarily distributed

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Solving polynomial least squares problems via semidefinite programming relaxations

Solving polynomial least squares problems via semidefinite programming relaxations Solving polynomial least squares problems via semidefinite programming relaxations Sunyoung Kim and Masakazu Kojima August 2007, revised in November, 2007 Abstract. A polynomial optimization problem whose

More information

Optimization of Design. Lecturer:Dung-An Wang Lecture 12

Optimization of Design. Lecturer:Dung-An Wang Lecture 12 Optimization of Design Lecturer:Dung-An Wang Lecture 12 Lecture outline Reading: Ch12 of text Today s lecture 2 Constrained nonlinear programming problem Find x=(x1,..., xn), a design variable vector of

More information

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Semi-Supervised Support Vector Machines and Application to Spam Filtering Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery

More information

Optimal File Sharing in Distributed Networks

Optimal File Sharing in Distributed Networks Optimal File Sharing in Distributed Networks Moni Naor Ron M. Roth Abstract The following file distribution problem is considered: Given a network of processors represented by an undirected graph G = (V,

More information

EXPLICIT ABS SOLUTION OF A CLASS OF LINEAR INEQUALITY SYSTEMS AND LP PROBLEMS. Communicated by Mohammad Asadzadeh. 1. Introduction

EXPLICIT ABS SOLUTION OF A CLASS OF LINEAR INEQUALITY SYSTEMS AND LP PROBLEMS. Communicated by Mohammad Asadzadeh. 1. Introduction Bulletin of the Iranian Mathematical Society Vol. 30 No. 2 (2004), pp 21-38. EXPLICIT ABS SOLUTION OF A CLASS OF LINEAR INEQUALITY SYSTEMS AND LP PROBLEMS H. ESMAEILI, N. MAHDAVI-AMIRI AND E. SPEDICATO

More information

Several Views of Support Vector Machines

Several Views of Support Vector Machines Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

More information

Pacific Journal of Mathematics

Pacific Journal of Mathematics Pacific Journal of Mathematics GLOBAL EXISTENCE AND DECREASING PROPERTY OF BOUNDARY VALUES OF SOLUTIONS TO PARABOLIC EQUATIONS WITH NONLOCAL BOUNDARY CONDITIONS Sangwon Seo Volume 193 No. 1 March 2000

More information