# On the k-support and Related Norms

Size: px
Start display at page:

## Transcription

1 On the k-support and Related Norms Massimiliano Pontil Department of Computer Science Centre for Computational Statistics and Machine Learning University College London (Joint work with Andrew McDonald and Dimitris Stamos) Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept / 14

2 Plan Problem Spectral regularization k-support norm Box norm Link to cluster norm Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept / 14

3 Problem Learn a matrix from a set of linear measurements: y i = W, X i + noise i, i = 1,..., n Method min W R d m n (y i W, X i ) 2 + λω(w ) Matrix completion: X i = e r e c Multitask learning: X i = e r x i Regularizer Ω encourages matrix structure Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept / 14

4 Spectral Regularization min W R d m n (y i W, X i ) 2 + λω(w ) Ω favors matrix structure (low rank, low variance, clustering, etc.) Choose an OI-norm: Ω(W ) W = UWV, U, V orthogonal von Neumann (1937): W = g(σ(w )), with g is an SG-function Well studied example is trace norm: g( ) = 1 Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept / 14

5 k-support Norm [Argyriou et al. 2012] Special case of group lasso with overlap [Jacob et al., 2009] w (k) = inf v J 2 : v J = w, supp(v J ) J J k J k Includes the l 1 -norm (k = 1) and l 2 -norm (k = d) Unit ball of (k) is the convex hull of {card(w) k, w 2 1} k Dual norm: u,(k) = ( u i )2 Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept / 14

6 Spectral k-support Norm k-support norm is an SG-function, inducing the OI-norm W (k) := σ(w ) (k) Proposition. Unit ball of σ( ) (k) is the convex hull of {rank(w ) k, W F 1} Includes trace norm (k = 1) and Frobenius norm (k = d) Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept / 14

7 Matrix Completion Experiment dataset norm test error r k a ML 100k tr ρ = 50% en ks box e-5 ML 1M tr ρ = 50% en ks box e-6 Jester1 tr per en line ks box e-5 Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept / 14

8 MTL Experiment Table: Multitask learning clustering on Lenk dataset, with simple thresholding. dataset norm test error k a Lenk fr (0.07) per task tr (0.04) - - en (0.04) - - ks (0.04) box (0.04) e-3 c-fr (0.08) - - c-tr (0.03) - - c-en (0.03) - - c-ks (0.03) c-box (0.03) e-3 Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept / 14

9 Box Norm Let Θ R d ++, bounded and convex and consider the norm: Box norm: Θ = w 2 Θ = inf θ Θ d w 2 i θ i, { a < θ i b, u 2,Θ = sup θ Θ d θ i c} Includes k-support norm for a = 0, b = 1, c = k d θ i ui 2 Unit ball is the convex hull of { w R d : i J J k w 2 i b + i / J } wi 2 a 1 Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept / 14

10 Unit Balls Figure: Unit balls of the box norm in R 2 for k = 1, a {0.01, 0.25, 0.50}. Figure: Unit balls of the dual box norm in R 2 for k = 1, a {0.01, 0.25, 0.50}. Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept / 14

11 Cluster Norm Box norm is an SG-function, inducing the OI-norm { d W 2 Θ = σ(w ) 2 Θ = inf σ i (W ) 2 : θ (a, b] d, θ i d } θ i c Associated OI-norm has been used to favour task clustering [Jacob et al. 2008]. It can be written as } W 2 Θ {tr(w = inf Σ 1 W T ) : ai Σ bi, tr Σ c Includes spectral k-support norm for a = 0, b = 1, c = k Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept / 14

12 Interpretation of a Proposition. If c = da + k(b a), the solution of the regularization problem is given by Ŵ = ˆV + Ẑ, where ( ˆV, Ẑ) = arg min V,Z n ( 1 (y i V + Z, X i ) 2 + λ a V 2 F + 1 ) b a Z 2 (k) Parameter a balances the relative importance of the two components Cluster norm is the Moureau envelope of spectral k-support norm: { 1 W 2 Θ = a W Z 2 F + 1 } b a Z 2 (k) min Z R d m Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept / 14

13 Computation of the Θ norm Assume w.l.o.g. w 0 with non increasing components w 2 Θ = 1 b w [1:q] c qb la w [q+1:d l] a w [l+1:d] 2 2, where q, l {0,..., d} are uniquely determined In particular: w (k) = w [1:q] k q w [q+1:d] 2 1 d where q {0,..., k 1} is determined by w q 1 k q w j > w q+1 j=q+1 Computation of norm is O(d log(d)) For k-support improves previous O(kd) method Efficient optimization using proximal-gradient methods Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept / 14

14 Extensions/Open Problems Other sets Θ allow for exact prox, e.g. Θ = {θ 1... θ d > 0}. Can give a general characterization? Online learning / stochastic optimization Kernel extensions Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept / 14

### GI01/M055 Supervised Learning Proximal Methods

GI01/M055 Supervised Learning Proximal Methods Massimiliano Pontil (based on notes by Luca Baldassarre) (UCL) Proximal Methods 1 / 20 Today s Plan Problem setting Convex analysis concepts Proximal operators

### Sparse Prediction with the k-support Norm

Sparse Prediction with the -Support Norm Andreas Argyriou École Centrale Paris argyrioua@ecp.fr Rina Foygel Department of Statistics, Stanford University rinafb@stanford.edu Nathan Srebro Toyota Technological

### Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of

### A Stochastic 3MG Algorithm with Application to 2D Filter Identification

A Stochastic 3MG Algorithm with Application to 2D Filter Identification Emilie Chouzenoux 1, Jean-Christophe Pesquet 1, and Anisia Florescu 2 1 Laboratoire d Informatique Gaspard Monge - CNRS Univ. Paris-Est,

### CSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye

CSE 494 CSE/CBS 598 Fall 2007: Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye 1 Introduction One important method for data compression and classification is to organize

### Notes on Symmetric Matrices

CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.

### A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION

1 A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION Dimitri Bertsekas M.I.T. FEBRUARY 2003 2 OUTLINE Convexity issues in optimization Historical remarks Our treatment of the subject Three unifying lines of

### When Is There a Representer Theorem? Vector Versus Matrix Regularizers

Journal of Machine Learning Research 10 (2009) 2507-2529 Submitted 9/08; Revised 3/09; Published 11/09 When Is There a Representer Theorem? Vector Versus Matrix Regularizers Andreas Argyriou Department

### Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

### Large-Scale Similarity and Distance Metric Learning

Large-Scale Similarity and Distance Metric Learning Aurélien Bellet Télécom ParisTech Joint work with K. Liu, Y. Shi and F. Sha (USC), S. Clémençon and I. Colin (Télécom ParisTech) Séminaire Criteo March

### Bilinear Prediction Using Low-Rank Models

Bilinear Prediction Using Low-Rank Models Inderjit S. Dhillon Dept of Computer Science UT Austin 26th International Conference on Algorithmic Learning Theory Banff, Canada Oct 6, 2015 Joint work with C-J.

### Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

### Maximum-Margin Matrix Factorization

Maximum-Margin Matrix Factorization Nathan Srebro Dept. of Computer Science University of Toronto Toronto, ON, CANADA nati@cs.toronto.edu Jason D. M. Rennie Tommi S. Jaakkola Computer Science and Artificial

### Group Lasso with Overlaps: the Latent Group Lasso approach

Group Lasso with Overlaps: the Latent Group Lasso approach Guillaume Obozinski Sierra team - INRIA Ecole Normale Supérieure (INRIA/ENS/CNRS UMR 8548) Paris, France Laurent Jacob Department of Statistics

### Convex Programming Tools for Disjunctive Programs

Convex Programming Tools for Disjunctive Programs João Soares, Departamento de Matemática, Universidade de Coimbra, Portugal Abstract A Disjunctive Program (DP) is a mathematical program whose feasible

### Tail inequalities for order statistics of log-concave vectors and applications

Tail inequalities for order statistics of log-concave vectors and applications Rafał Latała Based in part on a joint work with R.Adamczak, A.E.Litvak, A.Pajor and N.Tomczak-Jaegermann Banff, May 2011 Basic

### Several Views of Support Vector Machines

Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min

### Big Data Techniques Applied to Very Short-term Wind Power Forecasting

Big Data Techniques Applied to Very Short-term Wind Power Forecasting Ricardo Bessa Senior Researcher (ricardo.j.bessa@inesctec.pt) Center for Power and Energy Systems, INESC TEC, Portugal Joint work with

### A FIRST COURSE IN OPTIMIZATION THEORY

A FIRST COURSE IN OPTIMIZATION THEORY RANGARAJAN K. SUNDARAM New York University CAMBRIDGE UNIVERSITY PRESS Contents Preface Acknowledgements page xiii xvii 1 Mathematical Preliminaries 1 1.1 Notation

### 16.3 Fredholm Operators

Lectures 16 and 17 16.3 Fredholm Operators A nice way to think about compact operators is to show that set of compact operators is the closure of the set of finite rank operator in operator norm. In this

### Sensitivity analysis of utility based prices and risk-tolerance wealth processes

Sensitivity analysis of utility based prices and risk-tolerance wealth processes Dmitry Kramkov, Carnegie Mellon University Based on a paper with Mihai Sirbu from Columbia University Math Finance Seminar,

### On linear isometries on non-archimedean power series spaces

On linear isometries on non-archimedean power series spaces Wies law Śliwa and Agnieszka Ziemkowska Abstract. The non-archimedean power series spaces A p (a, t) are the most known and important examples

### On sequence kernels for SVM classification of sets of vectors: application to speaker verification

On sequence kernels for SVM classification of sets of vectors: application to speaker verification Major part of the Ph.D. work of In collaboration with Jérôme Louradour Francis Bach (ARMINES) within E-TEAM

### Principal Component Analysis Application to images

Principal Component Analysis Application to images Václav Hlaváč Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception http://cmp.felk.cvut.cz/

### 6.231 Dynamic Programming and Stochastic Control Fall 2008

MIT OpenCourseWare http://ocw.mit.edu 6.231 Dynamic Programming and Stochastic Control Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 6.231

### Galaxy Morphological Classification

Galaxy Morphological Classification Jordan Duprey and James Kolano Abstract To solve the issue of galaxy morphological classification according to a classification scheme modelled off of the Hubble Sequence,

### NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing

NMR Measurement of T1-T2 Spectra with Partial Measurements using Compressive Sensing Alex Cloninger Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu

### 10. Proximal point method

L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing

### Distributed Machine Learning and Big Data

Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya

### Some representability and duality results for convex mixed-integer programs.

Some representability and duality results for convex mixed-integer programs. Santanu S. Dey Joint work with Diego Morán and Juan Pablo Vielma December 17, 2012. Introduction About Motivation Mixed integer

### Linear Programming, Lagrange Multipliers, and Duality Geoff Gordon

lp.nb 1 Linear Programming, Lagrange Multipliers, and Duality Geoff Gordon lp.nb 2 Overview This is a tutorial about some interesting math and geometry connected with constrained optimization. It is not

### Advanced Stochastic Solutions for Management of Uncertainty: Incorporating Storage and Scenario Generation

CERTS R&M Review Washington DC June 9-10, 2016 Advanced Stochastic Solutions for Management of Uncertainty: Incorporating Storage and Scenario Generation C. Lindsay Anderson Luckny Zephyr Laurie L. Tupper

### Variational approach to restore point-like and curve-like singularities in imaging

Variational approach to restore point-like and curve-like singularities in imaging Daniele Graziani joint work with Gilles Aubert and Laure Blanc-Féraud Roma 12/06/2012 Daniele Graziani (Roma) 12/06/2012

### LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

### Sparse modeling: some unifying theory and word-imaging

Sparse modeling: some unifying theory and word-imaging Bin Yu UC Berkeley Departments of Statistics, and EECS Based on joint work with: Sahand Negahban (UC Berkeley) Pradeep Ravikumar (UT Austin) Martin

### On the dual of the solvency cone

On the dual of the solvency cone Andreas Löhne Friedrich-Schiller-Universität Jena Joint work with: Birgit Rudloff (WU Wien) Wien, April, 0 Simplest solvency cone example Exchange between: Currency : Nepalese

### Bayesian Statistics: Indian Buffet Process

Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note

### STORM: Stochastic Optimization Using Random Models Katya Scheinberg Lehigh University. (Joint work with R. Chen and M. Menickelly)

STORM: Stochastic Optimization Using Random Models Katya Scheinberg Lehigh University (Joint work with R. Chen and M. Menickelly) Outline Stochastic optimization problem black box gradient based Existing

### ELEC-E8104 Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems

Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems Minimum Mean Square Error (MMSE) MMSE estimation of Gaussian random vectors Linear MMSE estimator for arbitrarily distributed

### The p-norm generalization of the LMS algorithm for adaptive filtering

The p-norm generalization of the LMS algorithm for adaptive filtering Jyrki Kivinen University of Helsinki Manfred Warmuth University of California, Santa Cruz Babak Hassibi California Institute of Technology

### Primal-Dual methods for sparse constrained matrix completion

Yu Xin MIT CSAIL Tommi Jaakkola MIT CSAIL Abstract We develop scalable algorithms for regular and non-negative matrix completion. In particular, we base the methods on trace-norm regularization that induces

### Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski.

Discussion on the paper Hypotheses testing by convex optimization by A. Goldenschluger, A. Juditsky and A. Nemirovski. Fabienne Comte, Celine Duval, Valentine Genon-Catalot To cite this version: Fabienne

### Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725

Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T

### Scheduling and Location (ScheLoc): Makespan Problem with Variable Release Dates

Scheduling and Location (ScheLoc): Makespan Problem with Variable Release Dates Donatas Elvikis, Horst W. Hamacher, Marcel T. Kalsch Department of Mathematics, University of Kaiserslautern, Kaiserslautern,

### Big Data Analytics. Lucas Rego Drumond

Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Going For Large Scale Going For Large Scale 1

### Three observations regarding Schatten p classes

Three observations regarding Schatten p classes Gideon Schechtman Abstract The paper contains three results, the common feature of which is that they deal with the Schatten p class. The first is a presentation

### Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

### Introduction to Online Learning Theory

Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent

### 1 Norms and Vector Spaces

008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)

### One side James Compactness Theorem

One side James Compactness Theorem 1 1 Department of Mathematics University of Murcia Topological Methods in Analysis and Optimization. On the occasion of the 70th birthday of Prof. Petar Kenderov A birthday

### Projection-free Online Learning

Elad Hazan Technion - Israel Inst. of Tech. Satyen Kale IBM T.J. Watson Research Center ehazan@ie.technion.ac.il sckale@us.ibm.com Abstract The computational bottleneck in applying online learning to massive

### Synaptic Learning Rules

Synaptic Learning Rules Computational Models of Neural Systems Lecture 4.1 David S. Touretzky October, 2013 Why Study Synaptic Plasticity? Synaptic learning rules determine the information processing capabilities

### Direct Convex Relaxations of Sparse SVM

Antoni B. Chan abchan@ucsd.edu Nuno Vasconcelos nuno@ece.ucsd.edu Gert R. G. Lanckriet gert@ece.ucsd.edu Department of Electrical and Computer Engineering, University of California, San Diego, CA, 9037,

### Definition of entanglement for pure and mixed states

Definition of entanglement for pure and mixed states seminar talk given by Marius Krumm in the master studies seminar course Selected Topics in Mathematical Physics: Quantum Information Theory at the University

### Cyber-Security Analysis of State Estimators in Power Systems

Cyber-Security Analysis of State Estimators in Electric Power Systems André Teixeira 1, Saurabh Amin 2, Henrik Sandberg 1, Karl H. Johansson 1, and Shankar Sastry 2 ACCESS Linnaeus Centre, KTH-Royal Institute

### Topological Data Analysis Applications to Computer Vision

Topological Data Analysis Applications to Computer Vision Vitaliy Kurlin, http://kurlin.org Microsoft Research Cambridge and Durham University, UK Topological Data Analysis quantifies topological structures

### Proximal mapping via network optimization

L. Vandenberghe EE236C (Spring 23-4) Proximal mapping via network optimization minimum cut and maximum flow problems parametric minimum cut problem application to proximal mapping Introduction this lecture:

### Dantzig-Wolfe bound and Dantzig-Wolfe cookbook

Dantzig-Wolfe bound and Dantzig-Wolfe cookbook thst@man.dtu.dk DTU-Management Technical University of Denmark 1 Outline LP strength of the Dantzig-Wolfe The exercise from last week... The Dantzig-Wolfe

### Optimization with Sparsity-Inducing Penalties. Contents

Foundations and Trends R in Machine Learning Vol. 4, No. 1 (2011) 1 106 c 2012 F. Bach, R. Jenatton, J. Mairal and G. Obozinski DOI: 10.1561/2200000015 Optimization with Sparsity-Inducing Penalties By

### Learning, Regularization and Ill-Posed Inverse Problems

Learning, Regularization and Ill-Posed Inverse Problems Lorenzo Rosasco DISI, Università di Genova rosasco@disi.unige.it Andrea Caponnetto DISI, Università di Genova caponnetto@disi.unige.it Ernesto De

### Duality of linear conic problems

Duality of linear conic problems Alexander Shapiro and Arkadi Nemirovski Abstract It is well known that the optimal values of a linear programming problem and its dual are equal to each other if at least

### Sparse Coding: An Overview

Sparse Coding: An Overview Brian Booth SFU Machine Learning Reading Group November 12, 2013 The aim of sparse coding The aim of sparse coding Every column of D is a prototype The aim of sparse coding Every

### Compressive Sensing. Examples in Image Compression. Lecture 4, July 30, Luiz Velho Eduardo A. B. da Silva Adriana Schulz

Compressive Sensing Examples in Image Compression Lecture 4, July, 09 Luiz Velho Eduardo A. B. da Silva Adriana Schulz Today s Lecture Discuss applications of CS in image compression Evaluate CS efficiency

### Analyzing The Role Of Dimension Arrangement For Data Visualization in Radviz

Analyzing The Role Of Dimension Arrangement For Data Visualization in Radviz Luigi Di Caro 1, Vanessa Frias-Martinez 2, and Enrique Frias-Martinez 2 1 Department of Computer Science, Universita di Torino,

### BANACH AND HILBERT SPACE REVIEW

BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but

### Chapter 6. Cuboids. and. vol(conv(p ))

Chapter 6 Cuboids We have already seen that we can efficiently find the bounding box Q(P ) and an arbitrarily good approximation to the smallest enclosing ball B(P ) of a set P R d. Unfortunately, both

### Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they

### Parallel Data Mining. Team 2 Flash Coders Team Research Investigation Presentation 2. Foundations of Parallel Computing Oct 2014

Parallel Data Mining Team 2 Flash Coders Team Research Investigation Presentation 2 Foundations of Parallel Computing Oct 2014 Agenda Overview of topic Analysis of research papers Software design Overview

### The Need for Training in Big Data: Experiences and Case Studies

The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor

### Sequences and Convergence in Metric Spaces

Sequences and Convergence in Metric Spaces Definition: A sequence in a set X (a sequence of elements of X) is a function s : N X. We usually denote s(n) by s n, called the n-th term of s, and write {s

### Convex analysis and profit/cost/support functions

CALIFORNIA INSTITUTE OF TECHNOLOGY Division of the Humanities and Social Sciences Convex analysis and profit/cost/support functions KC Border October 2004 Revised January 2009 Let A be a subset of R m

Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains

### Classifying Chess Positions

Classifying Chess Positions Christopher De Sa December 14, 2012 Chess was one of the first problems studied by the AI community. While currently, chessplaying programs perform very well using primarily

### Understanding Big Data Spectral Clustering

Understanding Big Data Spectral Clustering Romain Couillet, Florent Benaych-Georges To cite this version: Romain Couillet, Florent Benaych-Georges Understanding Big Data Spectral Clustering 205 IEEE 6th

### Calculation of Minimum Distances. Minimum Distance to Means. Σi i = 1

Minimum Distance to Means Similar to Parallelepiped classifier, but instead of bounding areas, the user supplies spectral class means in n-dimensional space and the algorithm calculates the distance between

### Introduction to Convex Optimization for Machine Learning

Introduction to Convex Optimization for Machine Learning John Duchi University of California, Berkeley Practical Machine Learning, Fall 2009 Duchi (UC Berkeley) Convex Optimization for Machine Learning

### Stochastic Optimization for Big Data Analytics: Algorithms and Libraries

Stochastic Optimization for Big Data Analytics: Algorithms and Libraries Tianbao Yang SDM 2014, Philadelphia, Pennsylvania collaborators: Rong Jin, Shenghuo Zhu NEC Laboratories America, Michigan State

### Part II Redundant Dictionaries and Pursuit Algorithms

Aisenstadt Chair Course CRM September 2009 Part II Redundant Dictionaries and Pursuit Algorithms Stéphane Mallat Centre de Mathématiques Appliquées Ecole Polytechnique Sparsity in Redundant Dictionaries

### The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method Robert M. Freund February, 004 004 Massachusetts Institute of Technology. 1 1 The Algorithm The problem

Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

### Big Data Systems CS 5965/6965 FALL 2015

Big Data Systems CS 5965/6965 FALL 2015 Today General course overview Expectations from this course Q&A Introduction to Big Data Assignment #1 General Course Information Course Web Page http://www.cs.utah.edu/~hari/teaching/fall2015.html

### Big Data & Scripting Part II Streaming Algorithms

Big Data & Scripting Part II Streaming Algorithms 1, 2, a note on sampling and filtering sampling: (randomly) choose a representative subset filtering: given some criterion (e.g. membership in a set),

### IFT3395/6390. Machine Learning from linear regression to Neural Networks. Machine Learning. Training Set. t (3.5, -2,..., 127, 0,...

IFT3395/6390 Historical perspective: back to 1957 (Prof. Pascal Vincent) (Rosenblatt, Perceptron ) Machine Learning from linear regression to Neural Networks Computer Science Artificial Intelligence Symbolic

### The Many Facets of Big Data

Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong ACPR 2013 Big Data 1 volume sample size is big feature dimensionality is big 2 variety multiple formats:

### A Hybrid Algorithm for Solving the Absolute Value Equation

A Hybrid Algorithm for Solving the Absolute Value Equation Olvi L. Mangasarian Abstract We propose a hybrid algorithm for solving the NP-hard absolute value equation (AVE): Ax x = b, where A is an n n

### Nonnegative Matrix Factorization: Algorithms, Complexity and Applications

Nonnegative Matrix Factorization: Algorithms, Complexity and Applications Ankur Moitra Massachusetts Institute of Technology July 6th, 2015 Ankur Moitra (MIT) NMF July 6th, 2015 m M n W m M = A inner dimension

### Leveraging Big Data and Citizen Science to Understand Sub Continental Scale Ecological Patterns

Leveraging Big Data and Citizen Science to Understand Sub Continental Scale Ecological Patterns Noah R. Lottig University of Wisconsin Center for Limnology Roadmap 1. Approach to addressing sub-continental

### In what follows, we will focus on Voronoi diagrams in Euclidean space. Later, we will generalize to other distance spaces.

Voronoi Diagrams 4 A city builds a set of post offices, and now needs to determine which houses will be served by which office. It would be wasteful for a postman to go out of their way to make a delivery

### Morphological Diversity and Sparsity for Multichannel Data Restoration

Morphological Diversity and Sparsity for Multichannel Data Restoration J.Bobin 1, Y.Moudden 1, J.Fadili and J-L.Starck 1 1 jerome.bobin@cea.fr, ymoudden@cea.fr, jstarck@cea.fr - CEA-DAPNIA/SEDI, Service

### MODULE 15 Clustering Large Datasets LESSON 34

MODULE 15 Clustering Large Datasets LESSON 34 Incremental Clustering Keywords: Single Database Scan, Leader, BIRCH, Tree 1 Clustering Large Datasets Pattern matrix It is convenient to view the input data

### Differential Privacy Preserving Spectral Graph Analysis

Differential Privacy Preserving Spectral Graph Analysis Yue Wang, Xintao Wu, and Leting Wu University of North Carolina at Charlotte, {ywang91, xwu, lwu8}@uncc.edu Abstract. In this paper, we focus on

### Rank aggregation via nuclear norm minimization

Rank aggregation via nuclear norm minimization David F. Gleich Purdue University @dgleich Lek-Heng Lim University of Chicago KDD2011 San Diego, CA Lek funded by NSF CAREER award (DMS-1057064); David funded

### Notes for AA214, Chapter 7. T. H. Pulliam Stanford University

Notes for AA214, Chapter 7 T. H. Pulliam Stanford University 1 Stability of Linear Systems Stability will be defined in terms of ODE s and O E s ODE: Couples System O E : Matrix form from applying Eq.

### CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #18: Dimensionality Reduc7on

CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #18: Dimensionality Reduc7on Dimensionality Reduc=on Assump=on: Data lies on or near a low d- dimensional subspace Axes of this subspace

### An Introduction to Machine Learning

An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

### Analysis and Computation of Google s PageRank

Analysis and Computation of Google s PageRank Ilse Ipsen North Carolina State University Joint work with Rebecca M. Wills IMACS p.1 Overview Goal: Compute (citation) importance of a web page Simple Web

### Robust and data-driven approaches to call centers

Robust and data-driven approaches to call centers Dimitris Bertsimas Xuan Vinh Doan November 2008 Abstract We propose both robust and data-driven approaches to a fluid model of call centers that incorporates