A Stochastic 3MG Algorithm with Application to 2D Filter Identification

Size: px
Start display at page:

Download "A Stochastic 3MG Algorithm with Application to 2D Filter Identification"

Transcription

1 A Stochastic 3MG Algorithm with Application to 2D Filter Identification Emilie Chouzenoux 1, Jean-Christophe Pesquet 1, and Anisia Florescu 2 1 Laboratoire d Informatique Gaspard Monge - CNRS Univ. Paris-Est, France 2 Dunărea de Jos University, Electronics and Telecommunications Dept. Galaţi, România Sestri Levante 10 Sept EC (UPE) Sestri Levante 10 September / 13

2 Problem We want to where minimize h R N F(h) ( h R N ) F(h) = 1 2 E( y n X nh 2) +Ψ(h), (X n ) n 1 random matrices in R N Q, (y n ) n 1 random vectors in R Q with ( n N ) E( y n 2 ) = E(X n y n ) = r E(X n X n) = R, and Ψ: R N R regularization function. EC (UPE) Sestri Levante 10 September / 13

3 Problem We want to minimize h R N F(h) ( h R N ) F(h) = 1 2 E( y n X nh 2) +Ψ(h). Numerous applications: supervised classification inverse problems system identification, channel equalization linear prediction/interpolation echo cancellation, interference removal... EC (UPE) Sestri Levante 10 September / 13

4 Objective Solve the problem when the second-order statistics of (X n,y n ) n 1 are estimated on-line. Huge literature on the topic: stochastic approximation algorithms in machine learning sparse adaptive filtering methods. Employ a Majorize-Minimize strategy allowing the use of a wide range of regularization functions Ψ. Few works in the stochastic case [Mairal, 2013]. Develop fast algorithms by resorting to subspace acceleration. 3MG algorithm for batch optimization [Chouzenoux et al, 2011] connections with stochastic extensions of BFGS [Byrd et al, 2014]. EC (UPE) Sestri Levante 10 September / 13

5 Stochastic approximation of the criterion At iteration n N, where ( h R N ) F n (h) = 1 2n ρ n = 1 n n y k 2, r n = 1 n k=1 n y k X k h 2 +Ψ(h) k=1 = 1 2 ρ n r n h+ 1 2 h R n h+ψ(h) n X k y k, R n = 1 n k=1 Possibility to introduce a forgetting factor. Related approaches when Ψ 1 online time weighted LASSO [Angelosante et al, 2010] sparse RLS [Babadi et al, 2010]. n X k X k. k=1 EC (UPE) Sestri Levante 10 September / 13

6 Form of the regularization function ( h R N ) Ψ(h) = 1 2 h V 0 h v0 h + } {{ } elastic net penalization S ψ s ( V s h v s ) where v 0 R N, V 0 R N N symmetric positive semi-definite, for every s {1,...,S}, v s R Ps, V s R Ps N, ψ s : R R ability to take into account linear operators. s=1 EC (UPE) Sestri Levante 10 September / 13

7 Form of the regularization function ( h R N ) Ψ(h) = 1 2 h V 0 h v0 h + } {{ } elastic net penalization S ψ s ( V s h v s ) where v 0 R N, V 0 R N N symmetric positive semi-definite, for every s {1,...,S}, v s R Ps, V s R Ps N, ψ s : R R ability to take into account linear operators. Assumptions on (ψ s ) 1 s S 1 For every s {1,...,S}, ψ s is an even lower-bounded function, which is continuously differentiable, and lim t 0 ψ s (t)/t R, t 0 where ψ s denotes the derivative of ψ s. s=1 2 For every s {1,...,S}, ψ s (.) is concave on [0,+ [. 3 There exists ν [0,+ [ such that ( s {1,...,S}) ( t [0,+ [) 0 ν s (t) ν, where ν s (t) = ψ s (t)/t. EC (UPE) Sestri Levante 10 September / 13

8 Form of the regularization function ( h R N ) Ψ(h) = 1 2 h V 0 h v0 h+ Examples of functions (ψ s ) 1 s S S ψ s ( V s h v s ) s=1 Convex Nonconvex λ 1 s ψs(t) Type Name { t δ s log( t /δ s + 1) l 2 l 1 t 2 if t < δ s 2δ s t δs 2 l 2 l 1 Huber otherwise log(cosh(t)) l 2 l 1 Green (1 + t 2 /δs 2 )κs/2 1 l 2 l κs 1 exp( t 2 /(2δs 2 )) l 2 l 0 Welsch t 2 /(2δs 2 + t2 ) l 2 l 0 Geman { -McClure 1 (1 t 2 /(6δs 2 ))3 if t 6δ s l 2 l 0 Tukey biweight 1 otherwise tanh(t 2 /(2δs 2 )) l 2 l 0 Hyberbolic tangent log(1 + t 2 /δs 2 ) l 2 log Cauchy 1 exp(1 (1 + t 2 /(2δs 2 ))κs/2 ) l 2 l κs l 0 Chouzenoux (λ s,δ s) ]0,+ [ 2, κ s [1,2] EC (UPE) Sestri Levante 10 September / 13

9 Optimization recipe EC (UPE) Sestri Levante 10 September / 13

10 Optimization recipe 1 Find a tractable surrogate for F n Quadratic tangent majorant where ( h R N ) Θ n (h,h n ) = F n (h n )+ F n (h n ) (h h n ) (h h n) A n (h n )(h h n ), A n (h) = R n +V 0 +V Diag ( b(h) ) V R N N V = [V 1...V S ] R P N, P = P 1 + +P S and b(h) = ( b i (h) ) 1 i P RP is such that ( s {1,...,S})( p {1,...,P s }) b P1 + +P s 1 +p(h) = ν s ( V s h v s ). EC (UPE) Sestri Levante 10 September / 13

11 Optimization recipe 1 Find a tractable surrogate for F n 2 Minimize in a subspace ( n N ) h n+1 argmin h spand n Θ n (h,h n ), with D n R N Mn. rank(dn ) = N half-quadratic algorithm Mn small low-complexity per iteration. Typical choice: D n = { [ F n (h n ),h n,h n h n 1 ] if n > 1 [ F n (h 1 ),h 1 ] if n = 1 3MG algorithm similar ideas in TWIST, FISTA,... EC (UPE) Sestri Levante 10 September / 13

12 Optimization recipe 1 Find a tractable surrogate for F n 2 Minimize in a subspace 3 Perform recursive updates of the second-order statistics ( n N ) r n = r n n (X ny n r n 1 ) R n = R n n (X nx n R n 1 ). EC (UPE) Sestri Levante 10 September / 13

13 Stochastic MM subspace algorithm r 0 = 0,R 0 = 0 Initialize D 0,u 0 h 1 = D 0 u 0,D R 0 = 0,DV 0 0 = V 0 D 0,D V 0 = V D 0 For all n = 1,... r n = r n n (X ny n r n 1 ) c n (h n ) = r n +v 0 +V Diag ( b(h n ) ) v D A n 1 = (1 1 n )DR n n X n(x nd n 1 ) +D V 0 n 1 +V Diag ( b(h n ) ) D V n 1 F n (h n ) = D A n 1 u n 1 c n (h n ) R n = R n n (X nx n R n 1 ) Set D n using F n (h n ) D R n = R n D n,d V 0 n = V 0 D n,d V n = V D n B n = D n( D R n +D V ) ( ) 0 n + D V Diag ( n b(hn ) ) D V n u n = B nd nc n (h n ) h n+1 = D n u n EC (UPE) Sestri Levante 10 September / 13

14 Computational complexity If V = I N, number of multiplications per iteration: N 2 (4M n +Q)/2, where N length of h, Q column dimension of X n, M n subspace dimension, and N max{m n,m n 1,Q}. EC (UPE) Sestri Levante 10 September / 13

15 Computational complexity If V = I N, number of multiplications per iteration: N 2 (4M n +Q)/2, where N length of h, Q column dimension of X n, M n subspace dimension, and N max{m n,m n 1,Q}. If V I N, number of multiplications per iteration: N ( P(M n +M n 1 +1) } {{ } upper bound on the complexity induced by linear operators +N(4M n +Q)/2 ), where P line dimension of V. EC (UPE) Sestri Levante 10 September / 13

16 Computational complexity If V = I N, number of multiplications per iteration: N 2 (4M n +Q)/2, where N length of h, Q column dimension of X n, M n subspace dimension, and N max{m n,m n 1,Q}. If V I N, number of multiplications per iteration: N ( P(M n +M n 1 +1) } {{ } upper bound on the complexity induced by linear operators +N(4M n +Q)/2 ), where P line dimension of V. Further complexity reduction possible by taking into account the structure of D n, e.g. 3MG, V = I N, Q = 1 complexity of RLS. EC (UPE) Sestri Levante 10 September / 13

17 Convergence results Assumptions Let denote (Ω,F,P) the underlying probability space. For every n N, let X n = σ ( (X k,y k ) 1 k n ) be the sub-sigma algebra of F generated by (X k,y k ) 1 k n. We make the following assumptions: 1 R+V 0 is a positive definite matrix. ( 2 (Xn,y n ) ) is a stationary ergodic sequence and, for every n 1 n N, the elements of X n and the components of y n have finite fourth-order moments. 3 For every n N, E( y n+1 2 X n ) = E(X n+1 y n+1 X n ) = r E(X n+1 X n+1 X n ) = R. 4 For every n N, { F n (h n ),h n } spand n. 5 h 1 is X 1 -measurable and, for every n N, D n is X n -measurable. EC (UPE) Sestri Levante 10 September / 13

18 Convergence results Theorem 1 The set of cluster points of (h n ) n 1 is almost surely a nonempty compact connected set. 2 Any element of this set is almost surely a critical point of F. 3 If the functions (ψ s ) 1 s S are convex, then the sequence (h n ) n 1 converges P-a.s. to the unique (global) minimizer of F. EC (UPE) Sestri Levante 10 September / 13

19 Convergence results Theorem 1 The set of cluster points of (h n ) n 1 is almost surely a nonempty compact connected set. 2 Any element of this set is almost surely a critical point of F. 3 If the functions (ψ s ) 1 s S are convex, then the sequence (h n ) n 1 converges P-a.s. to the unique (global) minimizer of F. Remark: In the deterministic case, the convergence of 3MG to a critical point of the cost function is ensured, under the assumption of F satisfies the Kurdyka-Łojasiewicz inequality. [Chouzenoux et al., 2013] EC (UPE) Sestri Levante 10 September / 13

20 Application to 2D filter identification Observation model y = S(h)x + w x RL original image (L = ) y RL degraded image, h RN unknown two-dimensional blur kernel (N = 212 ) Original image. EC (UPE) Sestri Levante Blurred and noisy image. 10 September / 13

21 Application to 2D filter identification Observation model y = S(h)x+w x R L original image (L = ) y R L degraded image, h R N unknown two-dimensional blur kernel (N = 21 2 ) S(h) Hankel-block Hankel matrix such that S(h)x = Xh w R L realization of white N(0, ) noise (BSNR = 24.8 db). EC (UPE) Sestri Levante 10 September / 13

22 Application to 2D filter identification Observation model y = S(h)x+w x R L original image (L = ) y R L degraded image, h R N unknown two-dimensional blur kernel (N = 21 2 ) S(h) Hankel-block Hankel matrix such that S(h)x = Xh w R L realization of white N(0, ) noise (BSNR = 24.8 db). Associated penalized MSE problem yn R Q and X n R Q N : Q = 64 2 lines of y and X isotropic penalization on the gradient of the blur kernel S = N, ( s {1,...,S}) P s = 2, with (λ,δ) ]0,+ [ 2. ψ s : u λ 1+u 2 /δ 2 EC (UPE) Sestri Levante 10 September / 13

23 Application to 2D filter identification Original blur kernel. Estimated blur kernel, relative error EC (UPE) Sestri Levante 10 September / 13

24 Application to 2D filter identification 10 2 Relative error Time (s.) Comparison of stochastic 3MG algorithm, SGD algorithm with decreasing stepsize n 1/2, and SAG/RDA algorithms with constant stepsizes. EC (UPE) Sestri Levante 10 September / 13

25 Open problems Kurdyka-Łojasiewicz inequality for a sequence of functions (F n ) n 1 converging to a function F? Quantification of the convergence rate? Extension to a non quadratic cost? Extension to nonsmooth minimization? (connections with Silvia s work?) EC (UPE) Sestri Levante 10 September / 13

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Beyond stochastic gradient descent for large-scale machine learning

Beyond stochastic gradient descent for large-scale machine learning Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - ECML-PKDD,

More information

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

More information

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d). 1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

Lecture 5 Principal Minors and the Hessian

Lecture 5 Principal Minors and the Hessian Lecture 5 Principal Minors and the Hessian Eivind Eriksen BI Norwegian School of Management Department of Economics October 01, 2010 Eivind Eriksen (BI Dept of Economics) Lecture 5 Principal Minors and

More information

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1

GenOpt (R) Generic Optimization Program User Manual Version 3.0.0β1 (R) User Manual Environmental Energy Technologies Division Berkeley, CA 94720 http://simulationresearch.lbl.gov Michael Wetter MWetter@lbl.gov February 20, 2009 Notice: This work was supported by the U.S.

More information

CS 295 (UVM) The LMS Algorithm Fall 2013 1 / 23

CS 295 (UVM) The LMS Algorithm Fall 2013 1 / 23 The LMS Algorithm The LMS Objective Function. Global solution. Pseudoinverse of a matrix. Optimization (learning) by gradient descent. LMS or Widrow-Hoff Algorithms. Convergence of the Batch LMS Rule.

More information

Lecture 5: Variants of the LMS algorithm

Lecture 5: Variants of the LMS algorithm 1 Standard LMS Algorithm FIR filters: Lecture 5: Variants of the LMS algorithm y(n) = w 0 (n)u(n)+w 1 (n)u(n 1) +...+ w M 1 (n)u(n M +1) = M 1 k=0 w k (n)u(n k) =w(n) T u(n), Error between filter output

More information

GI01/M055 Supervised Learning Proximal Methods

GI01/M055 Supervised Learning Proximal Methods GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators

More information

Online Learning, Stability, and Stochastic Gradient Descent

Online Learning, Stability, and Stochastic Gradient Descent Online Learning, Stability, and Stochastic Gradient Descent arxiv:1105.4701v3 [cs.lg] 8 Sep 2011 September 9, 2011 Tomaso Poggio, Stephen Voinea, Lorenzo Rosasco CBCL, McGovern Institute, CSAIL, Brain

More information

Machine learning and optimization for massive data

Machine learning and optimization for massive data Machine learning and optimization for massive data Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Eric Moulines - IHES, May 2015 Big data revolution?

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of

More information

Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

More information

Background 2. Lecture 2 1. The Least Mean Square (LMS) algorithm 4. The Least Mean Square (LMS) algorithm 3. br(n) = u(n)u H (n) bp(n) = u(n)d (n)

Background 2. Lecture 2 1. The Least Mean Square (LMS) algorithm 4. The Least Mean Square (LMS) algorithm 3. br(n) = u(n)u H (n) bp(n) = u(n)d (n) Lecture 2 1 During this lecture you will learn about The Least Mean Squares algorithm (LMS) Convergence analysis of the LMS Equalizer (Kanalutjämnare) Background 2 The method of the Steepest descent that

More information

Adaptive Solution for Blind Identification/Equalization Using Deterministic Maximum Likelihood

Adaptive Solution for Blind Identification/Equalization Using Deterministic Maximum Likelihood IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 50, NO 4, APRIL 2002 923 Adaptive Solution for Blind Identification/Equalization Using Deterministic Maximum Likelihood Florence Alberge, Pierre Duhamel, Fellow,

More information

A FIRST COURSE IN OPTIMIZATION THEORY

A FIRST COURSE IN OPTIMIZATION THEORY A FIRST COURSE IN OPTIMIZATION THEORY RANGARAJAN K. SUNDARAM New York University CAMBRIDGE UNIVERSITY PRESS Contents Preface Acknowledgements page xiii xvii 1 Mathematical Preliminaries 1 1.1 Notation

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α

More information

Stability of the LMS Adaptive Filter by Means of a State Equation

Stability of the LMS Adaptive Filter by Means of a State Equation Stability of the LMS Adaptive Filter by Means of a State Equation Vítor H. Nascimento and Ali H. Sayed Electrical Engineering Department University of California Los Angeles, CA 90095 Abstract This work

More information

4F7 Adaptive Filters (and Spectrum Estimation) Least Mean Square (LMS) Algorithm Sumeetpal Singh Engineering Department Email : sss40@eng.cam.ac.

4F7 Adaptive Filters (and Spectrum Estimation) Least Mean Square (LMS) Algorithm Sumeetpal Singh Engineering Department Email : sss40@eng.cam.ac. 4F7 Adaptive Filters (and Spectrum Estimation) Least Mean Square (LMS) Algorithm Sumeetpal Singh Engineering Department Email : sss40@eng.cam.ac.uk 1 1 Outline The LMS algorithm Overview of LMS issues

More information

The p-norm generalization of the LMS algorithm for adaptive filtering

The p-norm generalization of the LMS algorithm for adaptive filtering The p-norm generalization of the LMS algorithm for adaptive filtering Jyrki Kivinen University of Helsinki Manfred Warmuth University of California, Santa Cruz Babak Hassibi California Institute of Technology

More information

Big learning: challenges and opportunities

Big learning: challenges and opportunities Big learning: challenges and opportunities Francis Bach SIERRA Project-team, INRIA - Ecole Normale Supérieure December 2013 Omnipresent digital media Scientific context Big data Multimedia, sensors, indicators,

More information

Simple and efficient online algorithms for real world applications

Simple and efficient online algorithms for real world applications Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,

More information

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix. MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix. Nullspace Let A = (a ij ) be an m n matrix. Definition. The nullspace of the matrix A, denoted N(A), is the set of all n-dimensional column

More information

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}. Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian

More information

Penalized Logistic Regression and Classification of Microarray Data

Penalized Logistic Regression and Classification of Microarray Data Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification

More information

Large-Scale Similarity and Distance Metric Learning

Large-Scale Similarity and Distance Metric Learning Large-Scale Similarity and Distance Metric Learning Aurélien Bellet Télécom ParisTech Joint work with K. Liu, Y. Shi and F. Sha (USC), S. Clémençon and I. Colin (Télécom ParisTech) Séminaire Criteo March

More information

Random graphs with a given degree sequence

Random graphs with a given degree sequence Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

Linear-Quadratic Optimal Controller 10.3 Optimal Linear Control Systems

Linear-Quadratic Optimal Controller 10.3 Optimal Linear Control Systems Linear-Quadratic Optimal Controller 10.3 Optimal Linear Control Systems In Chapters 8 and 9 of this book we have designed dynamic controllers such that the closed-loop systems display the desired transient

More information

4.1 Learning algorithms for neural networks

4.1 Learning algorithms for neural networks 4 Perceptron Learning 4.1 Learning algorithms for neural networks In the two preceding chapters we discussed two closely related models, McCulloch Pitts units and perceptrons, but the question of how to

More information

Applications to Data Smoothing and Image Processing I

Applications to Data Smoothing and Image Processing I Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is

More information

The Ideal Class Group

The Ideal Class Group Chapter 5 The Ideal Class Group We will use Minkowski theory, which belongs to the general area of geometry of numbers, to gain insight into the ideal class group of a number field. We have already mentioned

More information

Diffusion Adaptation Strategies for Distributed Optimization and Learning over Networks

Diffusion Adaptation Strategies for Distributed Optimization and Learning over Networks Diffusion Adaptation Strategies for Distributed Optimization and Learning over Networks Jianshu Chen, Student Member, IEEE, and Ali H. Sayed, Fellow, IEEE 1 arxiv:1111.0034v3 [math.oc] 12 May 2012 Abstract

More information

Tail inequalities for order statistics of log-concave vectors and applications

Tail inequalities for order statistics of log-concave vectors and applications Tail inequalities for order statistics of log-concave vectors and applications Rafał Latała Based in part on a joint work with R.Adamczak, A.E.Litvak, A.Pajor and N.Tomczak-Jaegermann Banff, May 2011 Basic

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University

More information

Online Learning for Matrix Factorization and Sparse Coding

Online Learning for Matrix Factorization and Sparse Coding Journal of Machine Learning Research 11 (2010) 19-60 Submitted 7/09; Revised 11/09; Published 1/10 Online Learning for Matrix Factorization and Sparse Coding Julien Mairal Francis Bach INRIA - WILLOW Project-Team

More information

Analysis of Mean-Square Error and Transient Speed of the LMS Adaptive Algorithm

Analysis of Mean-Square Error and Transient Speed of the LMS Adaptive Algorithm IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 7, JULY 2002 1873 Analysis of Mean-Square Error Transient Speed of the LMS Adaptive Algorithm Onkar Dabeer, Student Member, IEEE, Elias Masry, Fellow,

More information

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing

More information

Kristine L. Bell and Harry L. Van Trees. Center of Excellence in C 3 I George Mason University Fairfax, VA 22030-4444, USA kbell@gmu.edu, hlv@gmu.

Kristine L. Bell and Harry L. Van Trees. Center of Excellence in C 3 I George Mason University Fairfax, VA 22030-4444, USA kbell@gmu.edu, hlv@gmu. POSERIOR CRAMÉR-RAO BOUND FOR RACKING ARGE BEARING Kristine L. Bell and Harry L. Van rees Center of Excellence in C 3 I George Mason University Fairfax, VA 22030-4444, USA bell@gmu.edu, hlv@gmu.edu ABSRAC

More information

24. The Branch and Bound Method

24. The Branch and Bound Method 24. The Branch and Bound Method It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no

More information

System Identification for Acoustic Comms.:

System Identification for Acoustic Comms.: System Identification for Acoustic Comms.: New Insights and Approaches for Tracking Sparse and Rapidly Fluctuating Channels Weichang Li and James Preisig Woods Hole Oceanographic Institution The demodulation

More information

CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION

CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION No: CITY UNIVERSITY LONDON BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION ENGINEERING MATHEMATICS 2 (resit) EX2005 Date: August

More information

Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel

Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 2, FEBRUARY 2002 359 Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel Lizhong Zheng, Student

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Solving polynomial least squares problems via semidefinite programming relaxations

Solving polynomial least squares problems via semidefinite programming relaxations Solving polynomial least squares problems via semidefinite programming relaxations Sunyoung Kim and Masakazu Kojima August 2007, revised in November, 2007 Abstract. A polynomial optimization problem whose

More information

Introduction to Online Learning Theory

Introduction to Online Learning Theory Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent

More information

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set.

MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set. MATH 304 Linear Algebra Lecture 9: Subspaces of vector spaces (continued). Span. Spanning set. Vector space A vector space is a set V equipped with two operations, addition V V (x,y) x + y V and scalar

More information

Stochastic Gradient Method: Applications

Stochastic Gradient Method: Applications Stochastic Gradient Method: Applications February 03, 2015 P. Carpentier Master MMMEF Cours MNOS 2014-2015 114 / 267 Lecture Outline 1 Two Elementary Exercices on the Stochastic Gradient Two-Stage Recourse

More information

NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing

NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing Alex Cloninger Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

These axioms must hold for all vectors ū, v, and w in V and all scalars c and d.

These axioms must hold for all vectors ū, v, and w in V and all scalars c and d. DEFINITION: A vector space is a nonempty set V of objects, called vectors, on which are defined two operations, called addition and multiplication by scalars (real numbers), subject to the following axioms

More information

Optimisation par majoration-minimisation pour les problèmes inverses linéaires

Optimisation par majoration-minimisation pour les problèmes inverses linéaires Optimisation par majoration-minimisation pour les problèmes inverses linéaires Jérôme Idier jerome.idier@irccyn.ec-nantes.fr http://www.irccyn.ec-nantes.fr/~idier Equipe Analyse et Décision en Traitement

More information

We shall turn our attention to solving linear systems of equations. Ax = b

We shall turn our attention to solving linear systems of equations. Ax = b 59 Linear Algebra We shall turn our attention to solving linear systems of equations Ax = b where A R m n, x R n, and b R m. We already saw examples of methods that required the solution of a linear system

More information

Computational Optical Imaging - Optique Numerique. -- Deconvolution --

Computational Optical Imaging - Optique Numerique. -- Deconvolution -- Computational Optical Imaging - Optique Numerique -- Deconvolution -- Winter 2014 Ivo Ihrke Deconvolution Ivo Ihrke Outline Deconvolution Theory example 1D deconvolution Fourier method Algebraic method

More information

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen (für Informatiker) M. Grepl J. Berger & J.T. Frings Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2010/11 Problem Statement Unconstrained Optimality Conditions Constrained

More information

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

More information

Cyber-Security Analysis of State Estimators in Power Systems

Cyber-Security Analysis of State Estimators in Power Systems Cyber-Security Analysis of State Estimators in Electric Power Systems André Teixeira 1, Saurabh Amin 2, Henrik Sandberg 1, Karl H. Johansson 1, and Shankar Sastry 2 ACCESS Linnaeus Centre, KTH-Royal Institute

More information

UNIVERSITY GRADUATE STUDIES PROGRAM SILLIMAN UNIVERSITY DUMAGUETE CITY. Master of Science in Mathematics

UNIVERSITY GRADUATE STUDIES PROGRAM SILLIMAN UNIVERSITY DUMAGUETE CITY. Master of Science in Mathematics 1 UNIVERSITY GRADUATE STUDIES PROGRAM SILLIMAN UNIVERSITY DUMAGUETE CITY Master of Science in Mathematics Introduction The Master of Science in Mathematics (MS Math) program is intended for students who

More information

Parallel Selective Algorithms for Nonconvex Big Data Optimization

Parallel Selective Algorithms for Nonconvex Big Data Optimization 1874 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 63, NO. 7, APRIL 1, 2015 Parallel Selective Algorithms for Nonconvex Big Data Optimization Francisco Facchinei, Gesualdo Scutari, Senior Member, IEEE,

More information

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.

More information

SOLVING LINEAR SYSTEMS

SOLVING LINEAR SYSTEMS SOLVING LINEAR SYSTEMS Linear systems Ax = b occur widely in applied mathematics They occur as direct formulations of real world problems; but more often, they occur as a part of the numerical analysis

More information

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan

More information

Modélisation et résolutions numérique et symbolique

Modélisation et résolutions numérique et symbolique Modélisation et résolutions numérique et symbolique via les logiciels Maple et Matlab Jeremy Berthomieu Mohab Safey El Din Stef Graillat Mohab.Safey@lip6.fr Outline Previous course: partial review of what

More information

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 10

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 10 Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T. Heath Chapter 10 Boundary Value Problems for Ordinary Differential Equations Copyright c 2001. Reproduction

More information

The Advantages and Disadvantages of Online Linear Optimization

The Advantages and Disadvantages of Online Linear Optimization LINEAR PROGRAMMING WITH ONLINE LEARNING TATSIANA LEVINA, YURI LEVIN, JEFF MCGILL, AND MIKHAIL NEDIAK SCHOOL OF BUSINESS, QUEEN S UNIVERSITY, 143 UNION ST., KINGSTON, ON, K7L 3N6, CANADA E-MAIL:{TLEVIN,YLEVIN,JMCGILL,MNEDIAK}@BUSINESS.QUEENSU.CA

More information

Random access protocols for channel access. Markov chains and their stability. Laurent Massoulié.

Random access protocols for channel access. Markov chains and their stability. Laurent Massoulié. Random access protocols for channel access Markov chains and their stability laurent.massoulie@inria.fr Aloha: the first random access protocol for channel access [Abramson, Hawaii 70] Goal: allow machines

More information

Math 312 Homework 1 Solutions

Math 312 Homework 1 Solutions Math 31 Homework 1 Solutions Last modified: July 15, 01 This homework is due on Thursday, July 1th, 01 at 1:10pm Please turn it in during class, or in my mailbox in the main math office (next to 4W1) Please

More information

A linear algebraic method for pricing temporary life annuities

A linear algebraic method for pricing temporary life annuities A linear algebraic method for pricing temporary life annuities P. Date (joint work with R. Mamon, L. Jalen and I.C. Wang) Department of Mathematical Sciences, Brunel University, London Outline Introduction

More information

FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES

FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES FUNCTIONAL ANALYSIS LECTURE NOTES: QUOTIENT SPACES CHRISTOPHER HEIL 1. Cosets and the Quotient Space Any vector space is an abelian group under the operation of vector addition. So, if you are have studied

More information

Computing a Nearest Correlation Matrix with Factor Structure

Computing a Nearest Correlation Matrix with Factor Structure Computing a Nearest Correlation Matrix with Factor Structure Nick Higham School of Mathematics The University of Manchester higham@ma.man.ac.uk http://www.ma.man.ac.uk/~higham/ Joint work with Rüdiger

More information

Perron vector Optimization applied to search engines

Perron vector Optimization applied to search engines Perron vector Optimization applied to search engines Olivier Fercoq INRIA Saclay and CMAP Ecole Polytechnique May 18, 2011 Web page ranking The core of search engines Semantic rankings (keywords) Hyperlink

More information

10. Proximal point method

10. Proximal point method L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing

More information

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

More information

Elementary Gradient-Based Parameter Estimation

Elementary Gradient-Based Parameter Estimation Elementary Gradient-Based Parameter Estimation Julius O. Smith III Center for Computer Research in Music and Acoustics (CCRMA Department of Music, Stanford University, Stanford, California 94305 USA Abstract

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

On the k-support and Related Norms

On the k-support and Related Norms On the k-support and Related Norms Massimiliano Pontil Department of Computer Science Centre for Computational Statistics and Machine Learning University College London (Joint work with Andrew McDonald

More information

Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski.

Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Fabienne Comte, Celine Duval, Valentine Genon-Catalot To cite this version: Fabienne

More information

Dynamic Neural Networks for Actuator Fault Diagnosis: Application to the DAMADICS Benchmark Problem

Dynamic Neural Networks for Actuator Fault Diagnosis: Application to the DAMADICS Benchmark Problem Dynamic Neural Networks for Actuator Fault Diagnosis: Application to the DAMADICS Benchmark Problem Krzysztof PATAN and Thomas PARISINI University of Zielona Góra Poland e-mail: k.patan@issi.uz.zgora.pl

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

More information

Master of Arts in Mathematics

Master of Arts in Mathematics Master of Arts in Mathematics Administrative Unit The program is administered by the Office of Graduate Studies and Research through the Faculty of Mathematics and Mathematics Education, Department of

More information

Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization

Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization Archis Ghate a and Robert L. Smith b a Industrial Engineering, University of Washington, Box 352650, Seattle, Washington,

More information

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Semi-Supervised Support Vector Machines and Application to Spam Filtering Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery

More information

Binary Image Reconstruction

Binary Image Reconstruction A network flow algorithm for reconstructing binary images from discrete X-rays Kees Joost Batenburg Leiden University and CWI, The Netherlands kbatenbu@math.leidenuniv.nl Abstract We present a new algorithm

More information

MATHEMATICS (MATH) 3. Provides experiences that enable graduates to find employment in sciencerelated

MATHEMATICS (MATH) 3. Provides experiences that enable graduates to find employment in sciencerelated 194 / Department of Natural Sciences and Mathematics MATHEMATICS (MATH) The Mathematics Program: 1. Provides challenging experiences in Mathematics, Physics, and Physical Science, which prepare graduates

More information

Ideal Class Group and Units

Ideal Class Group and Units Chapter 4 Ideal Class Group and Units We are now interested in understanding two aspects of ring of integers of number fields: how principal they are (that is, what is the proportion of principal ideals

More information

CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e.

CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e. CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e. This chapter contains the beginnings of the most important, and probably the most subtle, notion in mathematical analysis, i.e.,

More information

LECTURE 4. Last time: Lecture outline

LECTURE 4. Last time: Lecture outline LECTURE 4 Last time: Types of convergence Weak Law of Large Numbers Strong Law of Large Numbers Asymptotic Equipartition Property Lecture outline Stochastic processes Markov chains Entropy rate Random

More information

Orthogonal Diagonalization of Symmetric Matrices

Orthogonal Diagonalization of Symmetric Matrices MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Solutions of Equations in One Variable. Fixed-Point Iteration II

Solutions of Equations in One Variable. Fixed-Point Iteration II Solutions of Equations in One Variable Fixed-Point Iteration II Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011

More information

High Performance Computing for Operation Research

High Performance Computing for Operation Research High Performance Computing for Operation Research IEF - Paris Sud University claude.tadonki@u-psud.fr INRIA-Alchemy seminar, Thursday March 17 Research topics Fundamental Aspects of Algorithms and Complexity

More information

Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data

Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data Jun Wang Department of Mechanical and Automation Engineering The Chinese University of Hong Kong Shatin, New Territories,

More information