Statistical Properties of Convex Clustering

Size: px

Start display at page:

Download "Statistical Properties of Convex Clustering"

Derick Blake Evans
7 years ago
Views:

1 Statistical Properties of Convex Clustering Kean Ming Tan University of Washington August 0, 05 / 3

2 Convex Clustering X = Observa0ons" " " " " n" Features" " " p" " " / 3

3 Convex Clustering X = Observa0ons" " " " " n" Features" " " p" " " C C " " 3" 4" 5" 6" 7" 8" 9" 0"! True Mean " 0"" 6" 9" 8" " 7" 5" 4" 3" Data / 3

4 Convex Clustering Recent interest in formulating estimators as the solutions to convex optimization problems: efficient algorithms give convergence to global optimum. optimality conditions fully characterize estimators. 3 / 3

5 Convex Clustering Recent interest in formulating estimators as the solutions to convex optimization problems: efficient algorithms give convergence to global optimum. optimality conditions fully characterize estimators. Clustering is a hard problem: non-convex. greedy algorithms do not achieve global optimum. 3 / 3

6 Convex Clustering Recent interest in formulating estimators as the solutions to convex optimization problems: efficient algorithms give convergence to global optimum. optimality conditions fully characterize estimators. Clustering is a hard problem: non-convex. greedy algorithms do not achieve global optimum. How about a convex formulation for clustering? 3 / 3

7 Convex Clustering A convex optimization problem, for q and λ 0: minimize U R n p n X i. U i. + λ U i. U i. q i <i i = Regularization Term: Encourages rows of Û to be identical. Definition: The ith and i th observations are in same cluster if and only if Û i. = Û i.. Pelckmans et al. 005, Hocking et al. 0, Lindsten et al. 0, Chi and Lange 04 4 / 3

8 Role of Tuning Parameter λ Principal Component λ = 0, 0 clusters Principal Component 5 / 3

9 Role of Tuning Parameter λ Principal Component λ = 0.3, 9 clusters Principal Component 5 / 3

10 Role of Tuning Parameter λ Principal Component λ = 0.4, 7 clusters Principal Component 5 / 3

11 Role of Tuning Parameter λ Principal Component λ = 0.5, 6 clusters Principal Component 5 / 3

12 Role of Tuning Parameter λ Principal Component λ = 0.6, 5 clusters Principal Component 5 / 3

13 Role of Tuning Parameter λ Principal Component λ = 0.65, 4 clusters Principal Component 5 / 3

14 Role of Tuning Parameter λ Principal Component λ = 0.67, clusters Principal Component 5 / 3

15 Algorithm Standard algorithms can be used to obtain the global optimum of the convex clustering problem for instance, alternating directions method of multipliers. Most of the existing literature on convex clustering has focused on algorithms, rather than statistical properties or empirical performance. 6 / 3

16 Degrees of Freedom For y N n (µ, σ I ), the degrees of freedom of ˆµ is defined as n Cov(ˆµ i, y i )/σ. i= Question: Can we derive an unbiased estimator for the degrees of freedom of convex clustering, for a given value of q and λ? 7 / 3

17 Unbiased Estimators for Degrees of Freedom Assume that each observation is independent N p (µ k, σ I ). Lemma: For q =, number of unique elements in Û. Lemma: For q =, a complicated expression! Application: Use BIC to select λ, i.e. to determine # of clusters. 8 / 3

18 Prediction Consistency Under certain assumptions, convex clustering s error in estimating the true cluster means decreases to zero as n, p. 9 / 3

19 Connection to k-means Clustering k-means clustering with clusters: minimize X i. µ + X i. µ µ,µ,c,c i C i C 0 / 3

20 Connection to k-means Clustering k-means clustering with clusters: minimize X i. µ + X i. µ µ,µ,c,c i C i C Convex Clustering with q = 0: minimize U R n p n X i. U i. + λ U i. U i. 0 i <i i = 0 / 3

21 Connection to k-means Clustering k-means clustering with clusters: minimize X i. µ + X i. µ µ,µ,c,c i C i C Convex Clustering with q = 0: minimize X i. µ µ,µ,c,c + X i. µ + λ C (n C ) i C i C 0 / 3

22 Connection to k-means Clustering k-means clustering with clusters: minimize X i. µ + X i. µ µ,µ,c,c i C i C Convex Clustering with q = 0: minimize X i. µ µ,µ,c,c + X i. µ + λ C (n C ) i C i C Regularization Term: Encourage size of the clusters to be unbalanced 0 / 3

23 Connection to Single Linkage Clustering Associated with every convex optimization problem is an equivalent dual problem. if certain conditions are satisfied... and they usually are / 3

24 Connection to Single Linkage Clustering Associated with every convex optimization problem is an equivalent dual problem. The dual problem for convex clustering... if certain conditions are satisfied... and they usually are / 3

25 Connection to Single Linkage Clustering Associated with every convex optimization problem is an equivalent dual problem. The dual problem for convex clustering is almost identical to the dual problem for single linkage clustering!!! if certain conditions are satisfied... and they usually are / 3

26 Simulation Studies: Mixture of Gaussians (a) Gaussian: K =, σ = (b) Gaussian: K =, σ = Rand Index Number of Estimated Clusters Rand Index Number of Estimated Clusters / 3

27 Bottom Line: Why Convex Clustering? I m a big fan of convexity... when it s useful. Not clear that convex clustering is useful! 3 / 3

28 Bottom Line: Why Convex Clustering? I m a big fan of convexity... when it s useful. Not clear that convex clustering is useful! + Can obtain global optimum. + Can estimate degrees of freedom. + Can establish prediction consistency. 3 / 3

29 Bottom Line: Why Convex Clustering? I m a big fan of convexity... when it s useful. Not clear that convex clustering is useful! + Can obtain global optimum. + Can estimate degrees of freedom. + Can establish prediction consistency. Essentially the same as single linkage clustering. Similar to k-means clustering. Underwhelming empirical performance. 3 / 3

30 Bottom Line: Why Convex Clustering? I m a big fan of convexity... when it s useful. Not clear that convex clustering is useful! + Can obtain global optimum. + Can estimate degrees of freedom. + Can establish prediction consistency. Essentially the same as single linkage clustering. Similar to k-means clustering. Underwhelming empirical performance. Tan and Witten (05): Statistical Properties of Convex Clustering. 3 / 3

Contextual-Bandit Approach to Recommendation Konstantin Knauf

Contextual-Bandit Approach to Recommendation Konstantin Knauf 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 1 Agenda Problem Scenario Scenario Multi-armed Bandit Model for Online Recommendation