On the ksupport and Related Norms


 Miranda Casey
 1 years ago
 Views:
Transcription
1 On the ksupport and Related Norms Massimiliano Pontil Department of Computer Science Centre for Computational Statistics and Machine Learning University College London (Joint work with Andrew McDonald and Dimitris Stamos) Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
2 Plan Problem Spectral regularization ksupport norm Box norm Link to cluster norm Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
3 Problem Learn a matrix from a set of linear measurements: y i = W, X i + noise i, i = 1,..., n Method min W R d m n (y i W, X i ) 2 + λω(w ) Matrix completion: X i = e r e c Multitask learning: X i = e r x i Regularizer Ω encourages matrix structure Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
4 Spectral Regularization min W R d m n (y i W, X i ) 2 + λω(w ) Ω favors matrix structure (low rank, low variance, clustering, etc.) Choose an OInorm: Ω(W ) W = UWV, U, V orthogonal von Neumann (1937): W = g(σ(w )), with g is an SGfunction Well studied example is trace norm: g( ) = 1 Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
5 ksupport Norm [Argyriou et al. 2012] Special case of group lasso with overlap [Jacob et al., 2009] w (k) = inf v J 2 : v J = w, supp(v J ) J J k J k Includes the l 1 norm (k = 1) and l 2 norm (k = d) Unit ball of (k) is the convex hull of {card(w) k, w 2 1} k Dual norm: u,(k) = ( u i )2 Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
6 Spectral ksupport Norm ksupport norm is an SGfunction, inducing the OInorm W (k) := σ(w ) (k) Proposition. Unit ball of σ( ) (k) is the convex hull of {rank(w ) k, W F 1} Includes trace norm (k = 1) and Frobenius norm (k = d) Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
7 Matrix Completion Experiment dataset norm test error r k a ML 100k tr ρ = 50% en ks box e5 ML 1M tr ρ = 50% en ks box e6 Jester1 tr per en line ks box e5 Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
8 MTL Experiment Table: Multitask learning clustering on Lenk dataset, with simple thresholding. dataset norm test error k a Lenk fr (0.07) per task tr (0.04)   en (0.04)   ks (0.04) box (0.04) e3 cfr (0.08)   ctr (0.03)   cen (0.03)   cks (0.03) cbox (0.03) e3 Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
9 Box Norm Let Θ R d ++, bounded and convex and consider the norm: Box norm: Θ = w 2 Θ = inf θ Θ d w 2 i θ i, { a < θ i b, u 2,Θ = sup θ Θ d θ i c} Includes ksupport norm for a = 0, b = 1, c = k d θ i ui 2 Unit ball is the convex hull of { w R d : i J J k w 2 i b + i / J } wi 2 a 1 Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
10 Unit Balls Figure: Unit balls of the box norm in R 2 for k = 1, a {0.01, 0.25, 0.50}. Figure: Unit balls of the dual box norm in R 2 for k = 1, a {0.01, 0.25, 0.50}. Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
11 Cluster Norm Box norm is an SGfunction, inducing the OInorm { d W 2 Θ = σ(w ) 2 Θ = inf σ i (W ) 2 : θ (a, b] d, θ i d } θ i c Associated OInorm has been used to favour task clustering [Jacob et al. 2008]. It can be written as } W 2 Θ {tr(w = inf Σ 1 W T ) : ai Σ bi, tr Σ c Includes spectral ksupport norm for a = 0, b = 1, c = k Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
12 Interpretation of a Proposition. If c = da + k(b a), the solution of the regularization problem is given by Ŵ = ˆV + Ẑ, where ( ˆV, Ẑ) = arg min V,Z n ( 1 (y i V + Z, X i ) 2 + λ a V 2 F + 1 ) b a Z 2 (k) Parameter a balances the relative importance of the two components Cluster norm is the Moureau envelope of spectral ksupport norm: { 1 W 2 Θ = a W Z 2 F + 1 } b a Z 2 (k) min Z R d m Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
13 Computation of the Θ norm Assume w.l.o.g. w 0 with non increasing components w 2 Θ = 1 b w [1:q] c qb la w [q+1:d l] a w [l+1:d] 2 2, where q, l {0,..., d} are uniquely determined In particular: w (k) = w [1:q] k q w [q+1:d] 2 1 d where q {0,..., k 1} is determined by w q 1 k q w j > w q+1 j=q+1 Computation of norm is O(d log(d)) For ksupport improves previous O(kd) method Efficient optimization using proximalgradient methods Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
14 Extensions/Open Problems Other sets Θ allow for exact prox, e.g. Θ = {θ 1... θ d > 0}. Can give a general characterization? Online learning / stochastic optimization Kernel extensions Massimiliano Pontil (UCL) On the ksupport and Related Norms Sestri Levante, Sept / 14
When Is There a Representer Theorem? Vector Versus Matrix Regularizers
Journal of Machine Learning Research 10 (2009) 25072529 Submitted 9/08; Revised 3/09; Published 11/09 When Is There a Representer Theorem? Vector Versus Matrix Regularizers Andreas Argyriou Department
More informationSparse Prediction with the ksupport Norm
Sparse Prediction with the Support Norm Andreas Argyriou École Centrale Paris argyrioua@ecp.fr Rina Foygel Department of Statistics, Stanford University rinafb@stanford.edu Nathan Srebro Toyota Technological
More informationOptimization with SparsityInducing Penalties. Contents
Foundations and Trends R in Machine Learning Vol. 4, No. 1 (2011) 1 106 c 2012 F. Bach, R. Jenatton, J. Mairal and G. Obozinski DOI: 10.1561/2200000015 Optimization with SparsityInducing Penalties By
More informationConditional mean embeddings as regressors
Steffen Grünewälder 1 STEFFEN@CS.UCL.AC.UK Guy Lever 1 G.LEVER@CS.UCL.AC.UK Luca Baldassarre L.BALDASSARRE@CS.UCL.AC.UK Sam Patterson SAM.X.PATTERSON@GMAIL.COM Arthur Gretton, ARTHUR.GRETTON@GMAIL.COM
More informationSee All by Looking at A Few: Sparse Modeling for Finding Representative Objects
See All by Looking at A Few: Sparse Modeling for Finding Representative Objects Ehsan Elhamifar Johns Hopkins University Guillermo Sapiro University of Minnesota René Vidal Johns Hopkins University Abstract
More informationA Tutorial on Spectral Clustering
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Spemannstr. 38, 7276 Tübingen, Germany ulrike.luxburg@tuebingen.mpg.de This article appears in Statistics
More informationGaussian Process Latent Variable Models for Visualisation of High Dimensional Data
Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data Neil D. Lawrence Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield,
More informationSet Identified Linear Models
Set Identified Linear Models Christian Bontemps, Thierry Magnac, Eric Maurin First version, August 2006 This version, December 2007 Abstract We analyze the identification and estimation of parameters β
More informationWhich Space Partitioning Tree to Use for Search?
Which Space Partitioning Tree to Use for Search? P. Ram Georgia Tech. / Skytree, Inc. Atlanta, GA 30308 p.ram@gatech.edu Abstract A. G. Gray Georgia Tech. Atlanta, GA 30308 agray@cc.gatech.edu We consider
More informationMultiway Clustering on Relation Graphs
Multiway Clustering on Relation Graphs Arindam Banerjee Sugato Basu Srujana Merugu Abstract A number of realworld domains such as social networks and ecommerce involve heterogeneous data that describes
More informationBayesian Statistics: Indian Buffet Process
Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note
More informationNew Support Vector Algorithms
LETTER Communicated by John Platt New Support Vector Algorithms Bernhard Schölkopf Alex J. Smola GMD FIRST, 12489 Berlin, Germany, and Department of Engineering, Australian National University, Canberra
More informationSupport Vector Data Description
Machine Learning, 54, 45 66, 2004 c 2004 Kluwer Academic Publishers. Manufactured in The Netherlands. Support Vector Data Description DAVID M.J. TAX davidt@first.fhg.de ROBERT P.W. DUIN r.p.w.duin@tnw.tudelft.nl
More information8. Linear leastsquares
8. Linear leastsquares EE13 (Fall 21112) definition examples and applications solution of a leastsquares problem, normal equations 81 Definition overdetermined linear equations if b range(a), cannot
More informationGroup Sparsity in Nonnegative Matrix Factorization
Group Sparsity in Nonnegative Matrix Factorization Jingu Kim Renato D. C. Monteiro Haesun Park Abstract A recent challenge in data analysis for science and engineering is that data are often represented
More informationA Neural Support Vector Network Architecture with Adaptive Kernels. 1 Introduction. 2 Support Vector Machines and Motivations
A Neural Support Vector Network Architecture with Adaptive Kernels Pascal Vincent & Yoshua Bengio Département d informatique et recherche opérationnelle Université de Montréal C.P. 6128 Succ. CentreVille,
More informationWhich Looks Like Which: Exploring InterClass Relationships in FineGrained Visual Categorization
Which Looks Like Which: Exploring InterClass Relationships in FineGrained Visual Categorization Jian Pu 1, YuGang Jiang 1, Jun Wang 2, Xiangyang Xue 1 1 School of Computer Science, Shanghai Key Laboratory
More informationTWO L 1 BASED NONCONVEX METHODS FOR CONSTRUCTING SPARSE MEAN REVERTING PORTFOLIOS
TWO L 1 BASED NONCONVEX METHODS FOR CONSTRUCTING SPARSE MEAN REVERTING PORTFOLIOS XIAOLONG LONG, KNUT SOLNA, AND JACK XIN Abstract. We study the problem of constructing sparse and fast mean reverting portfolios.
More informationUsing Both Latent and Supervised Shared Topics for Multitask Learning
Using Both Latent and Supervised Shared Topics for Multitask Learning Ayan Acharya 1, Aditya Rawal 2, Raymond J. Mooney 2, and Eduardo R. Hruschka 3 1 Department of ECE, University of Texas at Austin,
More informationChoosing Multiple Parameters for Support Vector Machines
Machine Learning, 46, 131 159, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Choosing Multiple Parameters for Support Vector Machines OLIVIER CHAPELLE LIP6, Paris, France olivier.chapelle@lip6.fr
More informationExpert Finding for Question Answering via Graph Regularized Matrix Completion
1 Expert Finding for Question Answering via Graph Regularized Matrix Completion Zhou Zhao, Lijun Zhang, Xiaofei He and Wilfred Ng Abstract Expert finding for question answering is a challenging problem
More informationFast Maximum Margin Matrix Factorization for Collaborative Prediction
for Collaborative Prediction Jason D. M. Rennie JRENNIE@CSAIL.MIT.EDU Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA Nathan Srebro NATI@CS.TORONTO.EDU
More informationLearning LocallyAdaptive Decision Functions for Person Verification
Learning LocallyAdaptive Decision Functions for Person Verification Zhen Li UIUC zhenli3@uiuc.edu Thomas S. Huang UIUC huang@ifp.uiuc.edu Shiyu Chang UIUC chang87@uiuc.edu Liangliang Cao IBM Research
More informationSupportVector Networks
Machine Learning, 20, 273297 (1995) 1995 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. SupportVector Networks CORINNA CORTES VLADIMIR VAPNIK AT&T Bell Labs., Holmdel, NJ 07733,
More informationSome Sharp Performance Bounds for Least Squares Regression with L 1 Regularization
Some Sharp Performance Bounds for Least Squares Regression with L 1 Regularization Tong Zhang Statistics Department Rutgers University, NJ tzhang@stat.rutgers.edu Abstract We derive sharp performance bounds
More informationEfficient Projections onto the l 1 Ball for Learning in High Dimensions
Efficient Projections onto the l 1 Ball for Learning in High Dimensions John Duchi Google, Mountain View, CA 94043 Shai ShalevShwartz Toyota Technological Institute, Chicago, IL, 60637 Yoram Singer Tushar
More informationEfficient OutofSample Extension of DominantSet Clusters
Efficient OutofSample Extension of DominantSet Clusters Massimiliano Pavan and Marcello Pelillo Dipartimento di Informatica, Università Ca Foscari di Venezia Via Torino 155, 30172 Venezia Mestre, Italy
More informationWho are you? Learning person specific classifiers from video
Who are you? Learning person specific classifiers from video Josef Sivic, Mark Everingham 2 and Andrew Zisserman 3 INRIA, WILLOW Project, Laboratoire d Informatique de l Ecole Normale Superieure, Paris,
More informationGroup Sparse Coding. Fernando Pereira Google Mountain View, CA pereira@google.com. Dennis Strelow Google Mountain View, CA strelow@google.
Group Sparse Coding Samy Bengio Google Mountain View, CA bengio@google.com Fernando Pereira Google Mountain View, CA pereira@google.com Yoram Singer Google Mountain View, CA singer@google.com Dennis Strelow
More informationNonparametric Factor Analysis with Beta Process Priors
Nonparametric Factor Analysis with Beta Process Priors John Paisley Lawrence Carin Department of Electrical & Computer Engineering Duke University, Durham, NC 7708 jwp4@ee.duke.edu lcarin@ee.duke.edu Abstract
More information