On the k-support and Related Norms

On the k-support and Related Norms Massimiliano Pontil Department of Computer Science Centre for Computational Statistics and Machine Learning University College London (Joint work with Andrew McDonald and Dimitris Stamos) Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept 2014 1 / 14

Plan Problem Spectral regularization k-support norm Box norm Link to cluster norm Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept 2014 2 / 14

Problem Learn a matrix from a set of linear measurements: y i = W, X i + noise i, i = 1,..., n Method min W R d m n (y i W, X i ) 2 + λω(w ) Matrix completion: X i = e r e c Multitask learning: X i = e r x i Regularizer Ω encourages matrix structure Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept 2014 3 / 14

Spectral Regularization min W R d m n (y i W, X i ) 2 + λω(w ) Ω favors matrix structure (low rank, low variance, clustering, etc.) Choose an OI-norm: Ω(W ) W = UWV, U, V orthogonal von Neumann (1937): W = g(σ(w )), with g is an SG-function Well studied example is trace norm: g( ) = 1 Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept 2014 4 / 14

k-support Norm [Argyriou et al. 2012] Special case of group lasso with overlap [Jacob et al., 2009] w (k) = inf v J 2 : v J = w, supp(v J ) J J k J k Includes the l 1 -norm (k = 1) and l 2 -norm (k = d) Unit ball of (k) is the convex hull of {card(w) k, w 2 1} k Dual norm: u,(k) = ( u i )2 Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept 2014 5 / 14

Spectral k-support Norm k-support norm is an SG-function, inducing the OI-norm W (k) := σ(w ) (k) Proposition. Unit ball of σ( ) (k) is the convex hull of {rank(w ) k, W F 1} Includes trace norm (k = 1) and Frobenius norm (k = d) Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept 2014 6 / 14

Matrix Completion Experiment dataset norm test error r k a ML 100k tr 0.2017 13 - - ρ = 50% en 0.2017 13 - - ks 0.1990 9 1.87 - box 0.1989 10 2.00 1e-5 ML 1M tr 0.1790 17 - - ρ = 50% en 0.1789 15 - - ks 0.1782 17 1.80 - box 0.1777 19 2.00 1e-6 Jester1 tr 0.1752 11 - - 20 per en 0.1752 11 - - line ks 0.1739 11 6.38 - box 0.1726 11 6.40 2e-5 Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept 2014 7 / 14

MTL Experiment Table: Multitask learning clustering on Lenk dataset, with simple thresholding. dataset norm test error k a Lenk fr 3.7869 (0.07) - - 8 per task tr 1.9058 (0.04) - - en 1.8974 (0.04) - - ks 1.8933 (0.04) 1.02 - box 1.8916 (0.04) 1.01 5.5e-3 c-fr 1.8667 (0.08) - - c-tr 1.7904 (0.03) - - c-en 1.7896 (0.03) - - c-ks 1.7775 (0.03) 1.89 - c-box 1.7754 (0.03) 1.12 9.5e-3 Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept 2014 8 / 14

Box Norm Let Θ R d ++, bounded and convex and consider the norm: Box norm: Θ = w 2 Θ = inf θ Θ d w 2 i θ i, { a < θ i b, u 2,Θ = sup θ Θ d θ i c} Includes k-support norm for a = 0, b = 1, c = k d θ i ui 2 Unit ball is the convex hull of { w R d : i J J k w 2 i b + i / J } wi 2 a 1 Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept 2014 9 / 14

Unit Balls Figure: Unit balls of the box norm in R 2 for k = 1, a {0.01, 0.25, 0.50}. Figure: Unit balls of the dual box norm in R 2 for k = 1, a {0.01, 0.25, 0.50}. Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept 2014 10 / 14

Cluster Norm Box norm is an SG-function, inducing the OI-norm { d W 2 Θ = σ(w ) 2 Θ = inf σ i (W ) 2 : θ (a, b] d, θ i d } θ i c Associated OI-norm has been used to favour task clustering [Jacob et al. 2008]. It can be written as } W 2 Θ {tr(w = inf Σ 1 W T ) : ai Σ bi, tr Σ c Includes spectral k-support norm for a = 0, b = 1, c = k Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept 2014 11 / 14

Interpretation of a Proposition. If c = da + k(b a), the solution of the regularization problem is given by Ŵ = ˆV + Ẑ, where ( ˆV, Ẑ) = arg min V,Z n ( 1 (y i V + Z, X i ) 2 + λ a V 2 F + 1 ) b a Z 2 (k) Parameter a balances the relative importance of the two components Cluster norm is the Moureau envelope of spectral k-support norm: { 1 W 2 Θ = a W Z 2 F + 1 } b a Z 2 (k) min Z R d m Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept 2014 12 / 14

Computation of the Θ norm Assume w.l.o.g. w 0 with non increasing components w 2 Θ = 1 b w [1:q] 2 2 + 1 c qb la w [q+1:d l] 2 1 + 1 a w [l+1:d] 2 2, where q, l {0,..., d} are uniquely determined In particular: w (k) = w [1:q] 2 2 + 1 k q w [q+1:d] 2 1 d where q {0,..., k 1} is determined by w q 1 k q w j > w q+1 j=q+1 Computation of norm is O(d log(d)) For k-support improves previous O(kd) method Efficient optimization using proximal-gradient methods Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept 2014 13 / 14

Extensions/Open Problems Other sets Θ allow for exact prox, e.g. Θ = {θ 1... θ d > 0}. Can give a general characterization? Online learning / stochastic optimization Kernel extensions Massimiliano Pontil (UCL) On the k-support and Related Norms Sestri Levante, Sept 2014 14 / 14