Statistical Properties of Convex Clustering

Size: px
Start display at page:

Download "Statistical Properties of Convex Clustering"

Transcription

1 Statistical Properties of Convex Clustering Kean Ming Tan University of Washington August 0, 05 / 3

2 Convex Clustering X = Observa0ons" " " " " n" Features" " " p" " " / 3

3 Convex Clustering X = Observa0ons" " " " " n" Features" " " p" " " C C " " 3" 4" 5" 6" 7" 8" 9" 0"! True Mean " 0"" 6" 9" 8" " 7" 5" 4" 3" Data / 3

4 Convex Clustering Recent interest in formulating estimators as the solutions to convex optimization problems: efficient algorithms give convergence to global optimum. optimality conditions fully characterize estimators. 3 / 3

5 Convex Clustering Recent interest in formulating estimators as the solutions to convex optimization problems: efficient algorithms give convergence to global optimum. optimality conditions fully characterize estimators. Clustering is a hard problem: non-convex. greedy algorithms do not achieve global optimum. 3 / 3

6 Convex Clustering Recent interest in formulating estimators as the solutions to convex optimization problems: efficient algorithms give convergence to global optimum. optimality conditions fully characterize estimators. Clustering is a hard problem: non-convex. greedy algorithms do not achieve global optimum. How about a convex formulation for clustering? 3 / 3

7 Convex Clustering A convex optimization problem, for q and λ 0: minimize U R n p n X i. U i. + λ U i. U i. q i <i i = Regularization Term: Encourages rows of Û to be identical. Definition: The ith and i th observations are in same cluster if and only if Û i. = Û i.. Pelckmans et al. 005, Hocking et al. 0, Lindsten et al. 0, Chi and Lange 04 4 / 3

8 Role of Tuning Parameter λ Principal Component λ = 0, 0 clusters Principal Component 5 / 3

9 Role of Tuning Parameter λ Principal Component λ = 0.3, 9 clusters Principal Component 5 / 3

10 Role of Tuning Parameter λ Principal Component λ = 0.4, 7 clusters Principal Component 5 / 3

11 Role of Tuning Parameter λ Principal Component λ = 0.5, 6 clusters Principal Component 5 / 3

12 Role of Tuning Parameter λ Principal Component λ = 0.6, 5 clusters Principal Component 5 / 3

13 Role of Tuning Parameter λ Principal Component λ = 0.65, 4 clusters Principal Component 5 / 3

14 Role of Tuning Parameter λ Principal Component λ = 0.67, clusters Principal Component 5 / 3

15 Algorithm Standard algorithms can be used to obtain the global optimum of the convex clustering problem for instance, alternating directions method of multipliers. Most of the existing literature on convex clustering has focused on algorithms, rather than statistical properties or empirical performance. 6 / 3

16 Degrees of Freedom For y N n (µ, σ I ), the degrees of freedom of ˆµ is defined as n Cov(ˆµ i, y i )/σ. i= Question: Can we derive an unbiased estimator for the degrees of freedom of convex clustering, for a given value of q and λ? 7 / 3

17 Unbiased Estimators for Degrees of Freedom Assume that each observation is independent N p (µ k, σ I ). Lemma: For q =, number of unique elements in Û. Lemma: For q =, a complicated expression! Application: Use BIC to select λ, i.e. to determine # of clusters. 8 / 3

18 Prediction Consistency Under certain assumptions, convex clustering s error in estimating the true cluster means decreases to zero as n, p. 9 / 3

19 Connection to k-means Clustering k-means clustering with clusters: minimize X i. µ + X i. µ µ,µ,c,c i C i C 0 / 3

20 Connection to k-means Clustering k-means clustering with clusters: minimize X i. µ + X i. µ µ,µ,c,c i C i C Convex Clustering with q = 0: minimize U R n p n X i. U i. + λ U i. U i. 0 i <i i = 0 / 3

21 Connection to k-means Clustering k-means clustering with clusters: minimize X i. µ + X i. µ µ,µ,c,c i C i C Convex Clustering with q = 0: minimize X i. µ µ,µ,c,c + X i. µ + λ C (n C ) i C i C 0 / 3

22 Connection to k-means Clustering k-means clustering with clusters: minimize X i. µ + X i. µ µ,µ,c,c i C i C Convex Clustering with q = 0: minimize X i. µ µ,µ,c,c + X i. µ + λ C (n C ) i C i C Regularization Term: Encourage size of the clusters to be unbalanced 0 / 3

23 Connection to Single Linkage Clustering Associated with every convex optimization problem is an equivalent dual problem. if certain conditions are satisfied... and they usually are / 3

24 Connection to Single Linkage Clustering Associated with every convex optimization problem is an equivalent dual problem. The dual problem for convex clustering... if certain conditions are satisfied... and they usually are / 3

25 Connection to Single Linkage Clustering Associated with every convex optimization problem is an equivalent dual problem. The dual problem for convex clustering is almost identical to the dual problem for single linkage clustering!!! if certain conditions are satisfied... and they usually are / 3

26 Simulation Studies: Mixture of Gaussians (a) Gaussian: K =, σ = (b) Gaussian: K =, σ = Rand Index Number of Estimated Clusters Rand Index Number of Estimated Clusters / 3

27 Bottom Line: Why Convex Clustering? I m a big fan of convexity... when it s useful. Not clear that convex clustering is useful! 3 / 3

28 Bottom Line: Why Convex Clustering? I m a big fan of convexity... when it s useful. Not clear that convex clustering is useful! + Can obtain global optimum. + Can estimate degrees of freedom. + Can establish prediction consistency. 3 / 3

29 Bottom Line: Why Convex Clustering? I m a big fan of convexity... when it s useful. Not clear that convex clustering is useful! + Can obtain global optimum. + Can estimate degrees of freedom. + Can establish prediction consistency. Essentially the same as single linkage clustering. Similar to k-means clustering. Underwhelming empirical performance. 3 / 3

30 Bottom Line: Why Convex Clustering? I m a big fan of convexity... when it s useful. Not clear that convex clustering is useful! + Can obtain global optimum. + Can estimate degrees of freedom. + Can establish prediction consistency. Essentially the same as single linkage clustering. Similar to k-means clustering. Underwhelming empirical performance. Tan and Witten (05): Statistical Properties of Convex Clustering. 3 / 3

Contextual-Bandit Approach to Recommendation Konstantin Knauf

Contextual-Bandit Approach to Recommendation Konstantin Knauf Contextual-Bandit Approach to Recommendation Konstantin Knauf 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 1 Agenda Problem Scenario Scenario Multi-armed Bandit Model for Online Recommendation

More information

Towards running complex models on big data

Towards running complex models on big data Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation

More information

Clusterpath: An Algorithm for Clustering using Convex Fusion Penalties

Clusterpath: An Algorithm for Clustering using Convex Fusion Penalties Toby Dylan Hocking TOBY.HOCKING@INRIA.FR Armand Joulin ARMAND.JOULIN@INRIA.FR Francis Bach FRANCIS.BACH@INRIA.FR INRIA Sierra team, Laboratoire d Informatique de l École Normale Supérieure, Paris, France

More information

Solving Systems of Linear Equations Using Matrices

Solving Systems of Linear Equations Using Matrices Solving Systems of Linear Equations Using Matrices What is a Matrix? A matrix is a compact grid or array of numbers. It can be created from a system of equations and used to solve the system of equations.

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical

More information

Cloud Computing. Computational Tasks Have value for task completion Require resources (Cores, Memory, Bandwidth) Compete for resources

Cloud Computing. Computational Tasks Have value for task completion Require resources (Cores, Memory, Bandwidth) Compete for resources Peter Key, Cloud Computing Computational Tasks Have value for task completion Require resources (Cores, Memory, Bandwidth) Compete for resources How much is a task or resource worth Can we use to price

More information

Package MixGHD. June 26, 2015

Package MixGHD. June 26, 2015 Type Package Package MixGHD June 26, 2015 Title Model Based Clustering, Classification and Discriminant Analysis Using the Mixture of Generalized Hyperbolic Distributions Version 1.7 Date 2015-6-15 Author

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Stock Trading by Modelling Price Trend with Dynamic Bayesian Networks

Stock Trading by Modelling Price Trend with Dynamic Bayesian Networks Stock Trading by Modelling Price Trend with Dynamic Bayesian Networks Jangmin O 1,JaeWonLee 2, Sung-Bae Park 1, and Byoung-Tak Zhang 1 1 School of Computer Science and Engineering, Seoul National University

More information

Practical Guide to the Simplex Method of Linear Programming

Practical Guide to the Simplex Method of Linear Programming Practical Guide to the Simplex Method of Linear Programming Marcel Oliver Revised: April, 0 The basic steps of the simplex algorithm Step : Write the linear programming problem in standard form Linear

More information

REPUTATION-BASED LIFE- COURSE TRAJECTORIES OF ILLICIT FORUM MEMBERS Botconf 13, Nantes (France)

REPUTATION-BASED LIFE- COURSE TRAJECTORIES OF ILLICIT FORUM MEMBERS Botconf 13, Nantes (France) REPUTATION-BASED LIFE- COURSE TRAJECTORIES OF ILLICIT FORUM MEMBERS Botconf 13, Nantes (France) BLACK MARKETS ONLINE BLACK MARKETS Convergence settings for online offenders Social organization of online

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

DEA implementation and clustering analysis using the K-Means algorithm

DEA implementation and clustering analysis using the K-Means algorithm Data Mining VI 321 DEA implementation and clustering analysis using the K-Means algorithm C. A. A. Lemos, M. P. E. Lins & N. F. F. Ebecken COPPE/Universidade Federal do Rio de Janeiro, Brazil Abstract

More information

The degrees of freedom of the Lasso in underdetermined linear regression models

The degrees of freedom of the Lasso in underdetermined linear regression models The degrees of freedom of the Lasso in underdetermined linear regression models C. Dossal (1), M. Kachour (2), J. Fadili (2), G. Peyré (3), C. Chesneau (4) (1) IMB, Université Bordeaux 1 (2) GREYC, ENSICAEN

More information

Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach

Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach MASTER S THESIS Recovery of primal solutions from dual subgradient methods for mixed binary linear programming; a branch-and-bound approach PAULINE ALDENVIK MIRJAM SCHIERSCHER Department of Mathematical

More information

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph

More information

Big Data & Scripting Part II Streaming Algorithms

Big Data & Scripting Part II Streaming Algorithms Big Data & Scripting Part II Streaming Algorithms 1, 2, a note on sampling and filtering sampling: (randomly) choose a representative subset filtering: given some criterion (e.g. membership in a set),

More information

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

More information

CITY UNIVERSITY OF HONG KONG. Revenue Optimization in Internet Advertising Auctions

CITY UNIVERSITY OF HONG KONG. Revenue Optimization in Internet Advertising Auctions CITY UNIVERSITY OF HONG KONG l ½ŒA Revenue Optimization in Internet Advertising Auctions p ]zwû ÂÃÙz Submitted to Department of Computer Science õò AX in Partial Fulfillment of the Requirements for the

More information

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is

Clustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is

More information

empireblue.com Empire Dual Advantage (HMO SNP) Dental-Yes Identification Number: Group: Issuer (80840): Rx Group: RX Bin: RxPCN:

empireblue.com Empire Dual Advantage (HMO SNP) Dental-Yes Identification Number: Group: Issuer (80840): Rx Group: RX Bin: RxPCN: Empire Dual Advantage (HMO SNP) H3370-028-000 X19690192400001 Freedom I (PPO) $20/$50 $40/$75 H3342-019-000 X19716256700001 Medicare limiting charges apply. Customer Service: 1-866-395-5175 Provider Service:

More information

Package MFDA. R topics documented: July 2, 2014. Version 1.1-4 Date 2007-10-30 Title Model Based Functional Data Analysis

Package MFDA. R topics documented: July 2, 2014. Version 1.1-4 Date 2007-10-30 Title Model Based Functional Data Analysis Version 1.1-4 Date 2007-10-30 Title Model Based Functional Data Analysis Package MFDA July 2, 2014 Author Wenxuan Zhong , Ping Ma Maintainer Wenxuan Zhong

More information

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing

More information

Dynamical Clustering of Personalized Web Search Results

Dynamical Clustering of Personalized Web Search Results Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC xshen@cs.uiuc.edu Hong Cheng CS Dept, UIUC hcheng3@uiuc.edu Abstract Most current search engines present the user a ranked

More information

Cluster Analysis. Chapter. Chapter Outline. What You Will Learn in This Chapter

Cluster Analysis. Chapter. Chapter Outline. What You Will Learn in This Chapter 5 Chapter Cluster Analysis Chapter Outline Introduction, 210 Business Situation, 211 Model, 212 Distance or Dissimilarities, 213 Combinatorial Searches with K-Means, 216 Statistical Mixture Model with

More information

6.231 Dynamic Programming and Stochastic Control Fall 2008

6.231 Dynamic Programming and Stochastic Control Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.231 Dynamic Programming and Stochastic Control Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 6.231

More information

Trading Strategies and the Cat Tournament Protocol

Trading Strategies and the Cat Tournament Protocol M A C H I N E L E A R N I N G P R O J E C T F I N A L R E P O R T F A L L 2 7 C S 6 8 9 CLASSIFICATION OF TRADING STRATEGIES IN ADAPTIVE MARKETS MARK GRUMAN MANJUNATH NARAYANA Abstract In the CAT Tournament,

More information

Distributed Caching Algorithms for Content Distribution Networks

Distributed Caching Algorithms for Content Distribution Networks Distributed Caching Algorithms for Content Distribution Networks Sem Borst, Varun Gupta, Anwar Walid Alcatel-Lucent Bell Labs, CMU BCAM Seminar Bilbao, September 30, 2010 Introduction Scope: personalized/on-demand

More information

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where.

1 Introduction. Linear Programming. Questions. A general optimization problem is of the form: choose x to. max f(x) subject to x S. where. Introduction Linear Programming Neil Laws TT 00 A general optimization problem is of the form: choose x to maximise f(x) subject to x S where x = (x,..., x n ) T, f : R n R is the objective function, S

More information

On the degrees of freedom in shrinkage estimation

On the degrees of freedom in shrinkage estimation On the degrees of freedom in shrinkage estimation Kengo Kato Graduate School of Economics, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan kato ken@hkg.odn.ne.jp October, 2007 Abstract

More information

Is log ratio a good value for measuring return in stock investments

Is log ratio a good value for measuring return in stock investments Is log ratio a good value for measuring return in stock investments Alfred Ultsch Databionics Research Group, University of Marburg, Germany, Contact: ultsch@informatik.uni-marburg.de Measuring the rate

More information

HT2015: SC4 Statistical Data Mining and Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning HT2015: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Bayesian Nonparametrics Parametric vs Nonparametric

More information

Data analysis in supersaturated designs

Data analysis in supersaturated designs Statistics & Probability Letters 59 (2002) 35 44 Data analysis in supersaturated designs Runze Li a;b;, Dennis K.J. Lin a;b a Department of Statistics, The Pennsylvania State University, University Park,

More information

Introduction to Linear Programming (LP) Mathematical Programming (MP) Concept

Introduction to Linear Programming (LP) Mathematical Programming (MP) Concept Introduction to Linear Programming (LP) Mathematical Programming Concept LP Concept Standard Form Assumptions Consequences of Assumptions Solution Approach Solution Methods Typical Formulations Massachusetts

More information

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

More information

Degrees of Freedom and Model Search

Degrees of Freedom and Model Search Degrees of Freedom and Model Search Ryan J. Tibshirani Abstract Degrees of freedom is a fundamental concept in statistical modeling, as it provides a quantitative description of the amount of fitting performed

More information

Nonlinear Programming Methods.S2 Quadratic Programming

Nonlinear Programming Methods.S2 Quadratic Programming Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective

More information

. 1/ CHAPTER- 4 SIMULATION RESULTS & DISCUSSION CHAPTER 4 SIMULATION RESULTS & DISCUSSION 4.1: ANT COLONY OPTIMIZATION BASED ON ESTIMATION OF DISTRIBUTION ACS possesses

More information

Nonlinear Optimization: Algorithms 3: Interior-point methods

Nonlinear Optimization: Algorithms 3: Interior-point methods Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,

More information

Empirical Model of Auction bidding

Empirical Model of Auction bidding Empirical Price Modeling for Sponsored Search Kuzman Ganchev, Alex Kulesza, Jinsong Tan, Ryan Gabbard, Qian Liu, and Michael Kearns University of Pennsylvania Philadelphia PA, USA {kuzman,kulesza,jinsong,gabbard,qianliu,mkearns}@cis.upenn.edu

More information

Research Article Determination of Pavement Rehabilitation Activities through a Permutation Algorithm

Research Article Determination of Pavement Rehabilitation Activities through a Permutation Algorithm Applied Mathematics Volume 2013, Article ID 252808, 5 pages http://dxdoiorg/101155/2013/252808 Research Article Determination of Pavement Rehabilitation Activities through a Permutation Algorithm Sangyum

More information

1051-232 Imaging Systems Laboratory II. Laboratory 4: Basic Lens Design in OSLO April 2 & 4, 2002

1051-232 Imaging Systems Laboratory II. Laboratory 4: Basic Lens Design in OSLO April 2 & 4, 2002 05-232 Imaging Systems Laboratory II Laboratory 4: Basic Lens Design in OSLO April 2 & 4, 2002 Abstract: For designing the optics of an imaging system, one of the main types of tools used today is optical

More information

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION

A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION 1 A NEW LOOK AT CONVEX ANALYSIS AND OPTIMIZATION Dimitri Bertsekas M.I.T. FEBRUARY 2003 2 OUTLINE Convexity issues in optimization Historical remarks Our treatment of the subject Three unifying lines of

More information

Abstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix multiplication).

Abstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix multiplication). MAT 2 (Badger, Spring 202) LU Factorization Selected Notes September 2, 202 Abstract: We describe the beautiful LU factorization of a square matrix (or how to write Gaussian elimination in terms of matrix

More information

Linear Programming. March 14, 2014

Linear Programming. March 14, 2014 Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1

More information

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Semi-Supervised Support Vector Machines and Application to Spam Filtering Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery

More information

W6.B.1. FAQs CS535 BIG DATA W6.B.3. 4. If the distance of the point is additionally less than the tight distance T 2, remove it from the original set

W6.B.1. FAQs CS535 BIG DATA W6.B.3. 4. If the distance of the point is additionally less than the tight distance T 2, remove it from the original set http://wwwcscolostateedu/~cs535 W6B W6B2 CS535 BIG DAA FAQs Please prepare for the last minute rush Store your output files safely Partial score will be given for the output from less than 50GB input Computer

More information

Stochastic Gradient Method: Applications

Stochastic Gradient Method: Applications Stochastic Gradient Method: Applications February 03, 2015 P. Carpentier Master MMMEF Cours MNOS 2014-2015 114 / 267 Lecture Outline 1 Two Elementary Exercices on the Stochastic Gradient Two-Stage Recourse

More information

Applying Data Analysis to Big Data Benchmarks. Jazmine Olinger

Applying Data Analysis to Big Data Benchmarks. Jazmine Olinger Applying Data Analysis to Big Data Benchmarks Jazmine Olinger Abstract This paper describes finding accurate and fast ways to simulate Big Data benchmarks. Specifically, using the currently existing simulation

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

2014-2015 The Master s Degree with Thesis Course Descriptions in Industrial Engineering

2014-2015 The Master s Degree with Thesis Course Descriptions in Industrial Engineering 2014-2015 The Master s Degree with Thesis Course Descriptions in Industrial Engineering Compulsory Courses IENG540 Optimization Models and Algorithms In the course important deterministic optimization

More information

When to Refinance Mortgage Loans in a Stochastic Interest Rate Environment

When to Refinance Mortgage Loans in a Stochastic Interest Rate Environment When to Refinance Mortgage Loans in a Stochastic Interest Rate Environment Siwei Gan, Jin Zheng, Xiaoxia Feng, and Dejun Xie Abstract Refinancing refers to the replacement of an existing debt obligation

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

STORM: Stochastic Optimization Using Random Models Katya Scheinberg Lehigh University. (Joint work with R. Chen and M. Menickelly)

STORM: Stochastic Optimization Using Random Models Katya Scheinberg Lehigh University. (Joint work with R. Chen and M. Menickelly) STORM: Stochastic Optimization Using Random Models Katya Scheinberg Lehigh University (Joint work with R. Chen and M. Menickelly) Outline Stochastic optimization problem black box gradient based Existing

More information

Dantzig-Wolfe bound and Dantzig-Wolfe cookbook

Dantzig-Wolfe bound and Dantzig-Wolfe cookbook Dantzig-Wolfe bound and Dantzig-Wolfe cookbook thst@man.dtu.dk DTU-Management Technical University of Denmark 1 Outline LP strength of the Dantzig-Wolfe The exercise from last week... The Dantzig-Wolfe

More information

FlowMergeCluster Documentation

FlowMergeCluster Documentation FlowMergeCluster Documentation Description: Author: Clustering of flow cytometry data using the FlowMerge algorithm. Josef Spidlen, jspidlen@bccrc.ca Please see the gp-flowcyt-help Google Group (https://groups.google.com/a/broadinstitute.org/forum/#!forum/gpflowcyt-help)

More information

Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29

More information

Application-Aware Data Collection in Wireless Sensor Networks

Application-Aware Data Collection in Wireless Sensor Networks This image cannot currently be displayed. Application-Aware Data Collection in Wireless Sensor Networks Fang Xiaolin, Gao Hong, Li Jianzhong Harbin Institute of Technology Li Yingshu Georgia State University

More information

Life Table Analysis using Weighted Survey Data

Life Table Analysis using Weighted Survey Data Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

PLS Path Modeling in Marketing and Genetic Algorithm Segmentation

PLS Path Modeling in Marketing and Genetic Algorithm Segmentation Page 1 of 8 ANZMAC 2009 PLS Path Modeling in Marketing and Genetic Algorithm Segmentation Christian M. Ringle, University of Hamburg and University of Technology Sydney, cringle@econ.uni-hamburg.de Marko

More information

There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:

There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows: Statistics: Rosie Cornish. 2007. 3.1 Cluster Analysis 1 Introduction This handout is designed to provide only a brief introduction to cluster analysis and how it is done. Books giving further details are

More information

Section 8.2 Solving a System of Equations Using Matrices (Guassian Elimination)

Section 8.2 Solving a System of Equations Using Matrices (Guassian Elimination) Section 8. Solving a System of Equations Using Matrices (Guassian Elimination) x + y + z = x y + 4z = x 4y + z = System of Equations x 4 y = 4 z A System in matrix form x A x = b b 4 4 Augmented Matrix

More information

Sentiment analysis using emoticons

Sentiment analysis using emoticons Sentiment analysis using emoticons Royden Kayhan Lewis Moharreri Steven Royden Ware Lewis Kayhan Steven Moharreri Ware Department of Computer Science, Ohio State University Problem definition Our aim was

More information

912 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 3, JUNE 2009

912 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 3, JUNE 2009 912 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 3, JUNE 2009 Energy Robustness Tradeoff in Cellular Network Power Control Chee Wei Tan, Member, IEEE, Daniel P. Palomar, Member, IEEE, and Mung Chiang,

More information

Big Data Simulator version

Big Data Simulator version Big Data Simulator version User Manual Website: http://prof.ict.ac.cn/bigdatabench/simulatorversion/ Content 1 Motivation... 3 2 Methodology... 3 3 Architecture subset... 3 3.1 Microarchitectural Metric

More information

Factorization Theorems

Factorization Theorems Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

More information

A few algorithmic issues in data centers Adam Wierman Caltech

A few algorithmic issues in data centers Adam Wierman Caltech A few algorithmic issues in data centers Adam Wierman Caltech A significant theory literature on green computing has emerged over the last decade BUT theory has yet to have significant impact in practice.

More information

Optimal shift scheduling with a global service level constraint

Optimal shift scheduling with a global service level constraint Optimal shift scheduling with a global service level constraint Ger Koole & Erik van der Sluis Vrije Universiteit Division of Mathematics and Computer Science De Boelelaan 1081a, 1081 HV Amsterdam The

More information

Direct Loss Minimization for Structured Prediction

Direct Loss Minimization for Structured Prediction Direct Loss Minimization for Structured Prediction David McAllester TTI-Chicago mcallester@ttic.edu Tamir Hazan TTI-Chicago tamir@ttic.edu Joseph Keshet TTI-Chicago jkeshet@ttic.edu Abstract In discriminative

More information

Unit 1. Today I am going to discuss about Transportation problem. First question that comes in our mind is what is a transportation problem?

Unit 1. Today I am going to discuss about Transportation problem. First question that comes in our mind is what is a transportation problem? Unit 1 Lesson 14: Transportation Models Learning Objective : What is a Transportation Problem? How can we convert a transportation problem into a linear programming problem? How to form a Transportation

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

Tsinghua University-Columbia University Double Masters Degree Program in Business Analytics Application Form

Tsinghua University-Columbia University Double Masters Degree Program in Business Analytics Application Form Tsinghua University-Columbia University Double Masters Degree Program in Business Analytics Application Form Please read the following instructions. Please submit the following materials (hard copy only)

More information

The University of Winnipeg Medical Algorithms propose New Network Utility Maximize Problem

The University of Winnipeg Medical Algorithms propose New Network Utility Maximize Problem The University of Winnipeg Applied Computer Science Candidate Presentation Amir-Hamed Mohsenian-Rad, University of Toronto Friday, March 12, 2010 9:30 a.m. - 10:30 a.m. - Room 3D03 Abstract Medium access

More information

A Network Flow Approach in Cloud Computing

A Network Flow Approach in Cloud Computing 1 A Network Flow Approach in Cloud Computing Soheil Feizi, Amy Zhang, Muriel Médard RLE at MIT Abstract In this paper, by using network flow principles, we propose algorithms to address various challenges

More information

A Distributed Line Search for Network Optimization

A Distributed Line Search for Network Optimization 01 American Control Conference Fairmont Queen Elizabeth, Montréal, Canada June 7-June 9, 01 A Distributed Line Search for Networ Optimization Michael Zargham, Alejandro Ribeiro, Ali Jadbabaie Abstract

More information

On the Interaction and Competition among Internet Service Providers

On the Interaction and Competition among Internet Service Providers On the Interaction and Competition among Internet Service Providers Sam C.M. Lee John C.S. Lui + Abstract The current Internet architecture comprises of different privately owned Internet service providers

More information

Reduced echelon form: Add the following conditions to conditions 1, 2, and 3 above:

Reduced echelon form: Add the following conditions to conditions 1, 2, and 3 above: Section 1.2: Row Reduction and Echelon Forms Echelon form (or row echelon form): 1. All nonzero rows are above any rows of all zeros. 2. Each leading entry (i.e. left most nonzero entry) of a row is in

More information

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix 7. LU factorization EE103 (Fall 2011-12) factor-solve method LU factorization solving Ax = b with A nonsingular the inverse of a nonsingular matrix LU factorization algorithm effect of rounding error sparse

More information

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE/ACM TRANSACTIONS ON NETWORKING 1 A Greedy Link Scheduler for Wireless Networks With Gaussian Multiple-Access and Broadcast Channels Arun Sridharan, Student Member, IEEE, C Emre Koksal, Member, IEEE,

More information

10. Proximal point method

10. Proximal point method L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing

More information

Is it statistically significant? The chi-square test

Is it statistically significant? The chi-square test UAS Conference Series 2013/14 Is it statistically significant? The chi-square test Dr Gosia Turner Student Data Management and Analysis 14 September 2010 Page 1 Why chi-square? Tests whether two categorical

More information

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725

Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725 Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T

More information

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet

More information

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,

More information

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

More information

Factorial experimental designs and generalized linear models

Factorial experimental designs and generalized linear models Statistics & Operations Research Transactions SORT 29 (2) July-December 2005, 249-268 ISSN: 1696-2281 www.idescat.net/sort Statistics & Operations Research c Institut d Estadística de Transactions Catalunya

More information

Probabilistic Latent Semantic Analysis (plsa)

Probabilistic Latent Semantic Analysis (plsa) Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References

More information

These axioms must hold for all vectors ū, v, and w in V and all scalars c and d.

These axioms must hold for all vectors ū, v, and w in V and all scalars c and d. DEFINITION: A vector space is a nonempty set V of objects, called vectors, on which are defined two operations, called addition and multiplication by scalars (real numbers), subject to the following axioms

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

1 Maximum likelihood estimation

1 Maximum likelihood estimation COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

More information

Linear Programming Notes VII Sensitivity Analysis

Linear Programming Notes VII Sensitivity Analysis Linear Programming Notes VII Sensitivity Analysis 1 Introduction When you use a mathematical model to describe reality you must make approximations. The world is more complicated than the kinds of optimization

More information

How To Solve The Cluster Algorithm

How To Solve The Cluster Algorithm Cluster Algorithms Adriano Cruz adriano@nce.ufrj.br 28 de outubro de 2013 Adriano Cruz adriano@nce.ufrj.br () Cluster Algorithms 28 de outubro de 2013 1 / 80 Summary 1 K-Means Adriano Cruz adriano@nce.ufrj.br

More information

A New Quantitative Behavioral Model for Financial Prediction

A New Quantitative Behavioral Model for Financial Prediction 2011 3rd International Conference on Information and Financial Engineering IPEDR vol.12 (2011) (2011) IACSIT Press, Singapore A New Quantitative Behavioral Model for Financial Prediction Thimmaraya Ramesh

More information

Cyber-Security Analysis of State Estimators in Power Systems

Cyber-Security Analysis of State Estimators in Power Systems Cyber-Security Analysis of State Estimators in Electric Power Systems André Teixeira 1, Saurabh Amin 2, Henrik Sandberg 1, Karl H. Johansson 1, and Shankar Sastry 2 ACCESS Linnaeus Centre, KTH-Royal Institute

More information

Question 2: How do you solve a matrix equation using the matrix inverse?

Question 2: How do you solve a matrix equation using the matrix inverse? Question : How do you solve a matrix equation using the matrix inverse? In the previous question, we wrote systems of equations as a matrix equation AX B. In this format, the matrix A contains the coefficients

More information

Cooperative Local Caching under Heterogeneous File Preferences

Cooperative Local Caching under Heterogeneous File Preferences Cooperative Local Caching under Heterogeneous File Preferences Yinghao Guo, Lingjie Duan, Member, IEEE and Rui Zhang, Senior arxiv:1510.04516v4 [cs.it] 12 May 2016 Member, IEEE Abstract Local caching is

More information

Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization

Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization Archis Ghate a and Robert L. Smith b a Industrial Engineering, University of Washington, Box 352650, Seattle, Washington,

More information

International Doctoral School Algorithmic Decision Theory: MCDA and MOO

International Doctoral School Algorithmic Decision Theory: MCDA and MOO International Doctoral School Algorithmic Decision Theory: MCDA and MOO Lecture 2: Multiobjective Linear Programming Department of Engineering Science, The University of Auckland, New Zealand Laboratoire

More information

On Quality of Monitoring for Multi-channel Wireless Infrastructure Networks

On Quality of Monitoring for Multi-channel Wireless Infrastructure Networks On Quality of Monitoring for Multi-channel Wireless Infrastructure Networks Arun Chhetri, Huy Nguyen, Gabriel Scalosub*, and Rong Zheng Department of Computer Science University of Houston, TX, USA *Department

More information