LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014"

Transcription

1 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu

2 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph Kernels by Spectral Transforms Gaussian field and Harmonic Function Reference

3 Semi Supervised Learning Semi-supervised learning is a class of supervised learning tasks and techniques that also make use of unlabeled data for training - typically a small amount of labeled data with a large amount of unlabeled data. Why Labeled data is hard to get Expensive, human annotation, time consuming May require experts Unlabeled data is cheap

4 Why unlabeled data helps[1] assuming each class is a coherent group (e.g. Gaussian) with and without unlabeled data: decision boundary shift

5 Label Propagation[2] Assumption Closer data points tend to have similar class labels. General Idea A node s labels propagate to neighboring nodes according to their proximity Clamp the labels on the labeled data, so the labeled data could act like a sources that push out labels to unlabeled data.

6 Set up Input x, label y Labeled data (x 1,y 1 ),(x 2, y 2 ) (x l,y l ) Unlabeled data (x l+1,y l+1 ) (x l+u,y l+u ) l<<u weight

7 Probabilistic Transition Matrix Allow larger edge weight to propagate labels easier w ij T ij = P j i = l+u k=1 w kj T ij is the probability to jump from node j to I Normalized T For example: The probability node 2 jump to Node 3 is 0.7 T ij = T ij k T ik The probability node 3 jump to Node 1 is 0.5

8 Matrix Y Define (l+u) * C label matrix Y, whose ith row representing label probability distribution of node x i Y ij =1, if the class of x i is c j, else 0, for labeled data The initialization of row of Y corresponding to unlabeled data is not important Node 1 is labeled as label 2. The label distribution of node 3. For example, 0.7 is the probability that node 3 is label 1.

9 Algorithm 1 Propagate Y TY Labels spread information along local structure 2 Row normalize Y Keep proper distribution over classes 3 Clamp the labeled data, Repeat from step 1 until Y converges Keep originally labeled points

10 Convergence The first two steps Split T Yu General from T is row normalized, T uu is submatrix of T

11 Convergence(cont) Consider the row sum No need to iterate!

12 Parameter Setting How to choose parameter σ First, use a heuristic method. Finding a minimum spanning tree over all data points with Euclidean distances d ij with Kruskal s Algorithm(The famous greedy algorithm in data structure). Choose the first tree edge that connect two components with different labeled points. The length is d 0. Set σ=d 0 /3

13 The effect of σ

14 Optimizing σ Single parameter σ controls spread of labels For σ 0, classification of unlabeled points dominated by nearest labeled point For σ, class probabilities just become class frequencies (no information from label proximity) Can minimize entropy of class labels H=- ij Y ij logy ij Leads to confident classifications However, minimum entropy at σ=0

15 Optimizing σ(cont) Add uniform transition component (U ij =1/N) to T ~ T U (1 )T For small σ, uniform component dominates Minimum entropy no longer at σ=0 Use σ 1 σ N to scale each dimension independently Perform gradient descent with respect to σ s in order to minimize entropy H d H Y L U C ic il1 c1 Yic d

16 Rebalancing Class Proportions How should we assign classes to unlabeled points? Could choose most likely class ML method does not explicitly control class proportions Suppose we want labels to fit a known or estimated distribution over classes Normalize class mass scale columns of Y U to fit class distribution and then pick ML class Does not guarantee strict label proportions Perform label bidding each entry Y U (i,c) is a bid of sample i for class c Handle bids from largest to smallest Bid is taken if class c is not full, otherwise it is discarded

17 Experiment result 3 bands dataset and Spring dataset

18 Learning with local and global consistency[4] The key to semi- supervised learning Nearby points are likely to have the same label Points on the same structure (cluster or manifold) are likely to have the same label

19 A toy example

20 Objective Design a classifying function which is sufficiently smooth with respect to the intrinsic structure

21 Algorithm Receive information from its neighbour Retain Initial information

22 Convergence The sequence {F(t)} converges, suppose F(0)=Y F*=(1-α)(I-αS) -1 Y The proof is similar to Label Propagation

23 Regularization Framework(A different pespective) Q(F)= 1 2 n i,j=1 W ij F i D ii F j n μ F D i=1 i Y i jj Smoothness term, capture the local variations, a good function should not change too much between nearby points Fitting constraints, loss function, a good classifying function should not change too much from initial label assignment

24 Regularization Framework Property of Laplacian Matrix 1 2 Differentiating Q(F) with respect to F Q n i,j=1 W ij F i D ii F j D jj 2 = F SF + μ F Y F F=F = 0 F 1 1+μ SF μ 1+μ α = 1, and β = μ 1+μ 1+μ I αs F = βy I αs is invertible, F = β(i αs) 1 Y = f T D 1/2 L sym D 1/2 f L sym = D 1/2 LD 1/2 =I-D 1/2 WD 1/2 =I-S

25 Experiment

26 Graph Kernels by Spectral Transforms[5] Graph-based semi-supervised learning methods can be viewed as imposing smoothness conditions on the target function Eigenvectors with small eigenvalues are smooth, and ideally represent large cluster structures within the data.

27 Smoothness Consider the Laplacian L

28 Smoothness Semi-supervised learning creates a smooth function over unlabeled points f T Lf 1 2 N i, j1 W ij ( f ( i) f ( j)) 2 Generally, smooth if f(i) f(j) for pairs with large W ij The smoothness of an eigenvector is Eigenvectors with smaller eigenvalues are smoother

29 Smoothness of Eigenvectors The complete orthonormal set of eigenvectors ø 1, n ø 2.. ø n L = λi i=1 φ i φ i T

30 Kernels by Spectral Transform Different weightings (i.e. spectral transforms) of Laplacian eigenvalues leads to different smoothness measures We want a kernel K that respects smoothness Define using eigenvectors of Laplacian (φ) and eigenvalues of K (μ) K N i1 Can also define in terms of a spectral transform of Laplacian eigenvalues K N i1 i i i i T i r( ) T i

31 Types of Transforms r(λ i ) is a non-negative and decreasing transform Regularize d Laplacian Diffusion Kernel 1- step Random Walk p - step Random Walk Inverse Cosine Step Function 1 r( ) 2 r( ) exp 2 r( ) ( ), 2 r( ) r( ) r( ) p ( ), 2 cos( / 4) 1if Reverses order of eigenvalues, so smooth eigenvectors have larger eigenvalues in K Is there an optimal transform? cut We need to find a regularizer here

32 Kernel Alignment Assess fitness of a kernel to training labels Empirical kernel alignment compares kernel matrix K tr for training data to target matrix T for training data T ij =1 if y i =y j, otherwise T ij =-1 Aˆ ( K tr, T) K tr K tr, K tr, T F F T, T M, N Tr( MN) F Alignment measure computes cosine between K tr and T Find the optimal spectral transformation r(λ i ) using the kernel alignment notion F Frobenius Product

33 Convex Optimization Convex set Convex function Convex Optimization Linear Programming Minimize c T x+d Subject to Gx h Ax=b Quadratically Constrained Quadratic Programming Minimize 1/2x T Px+c T x+d Subject to 1/2x T Q i X+r it x+s i =0 Ax=b

34 QCQP Kernel alignment between K tr and T is a convex function of kernel eigenvalues μ i No assumption on parametric form of transform r(λ i ) Need K to be positive semi-definite Restrict eigenvalues of K to be 0 Leads to computationally efficient Quadratically Constrained Quadratic Program Minimize convex quadratic function over smaller feasible region Both objective function and constraints are quadratic Complexity comparable to linear programs

35 Impose Order Constraints We would like to keep decreasing order on spectral transformation Smooth functions are preferred bigger eigenvalues for smoother eigenvectors An order constrained semi-supervised kernel K is the solution to the following convex optimization problem.

36 Improved Order Constraints Constant eigenvectors act as a bias term in the graph kernel λ 1 =0, corresponding eigenvector φ i is constant Need not constrain bias terms Improved Order constrains Ignore the constant eigenvectors μ i μ i+1, i = 1.. n 1, and i not constant

37 Harmonic Functions (Zhu, 2003)[3] Now define class labeling f in terms of a Gaussian over continuous space, instead of random field over discrete label set Distribution on f is a Gaussian field p Z E( f ) e ( f ) Z exp( E( f )) df f L f l Useful for multi-label problems (NP-hard for discrete random fields) ML configuration is now unique, attainable by matrix methods, and characterized by harmonic functions

38 Harmonic Energy Energy of solution labeling f is defined as: 1 E( f ) wij ( f ( i) f ( j)) 2 i, j Nearby points should have similar labels Solution which minimizes E(f) is harmonic Δf=0 for unlabeled points, where Δ=D-W (combinatorial Laplacian) Δf=f l for labeled points Value of f at an unlabeled point is the average of f at neighboring points 1 f ( j) wij f ( i), for j L 1,, L U d j i~ j 1 f D Wf 2

39 Harmonic Solution As before, split problem into: f f f Solve using Δf=0, f L = f l : f u ( D uu W W uu ) 1 W Can be viewed as heat kernel classification, but independent of time parameter ul f l W l ll lu 1 u W W ul W uu ( I P uu ) 1 P ul P f l D W

40 Summary Label Propagation Propagate and clamp data Local and global consistency Allow f(x l ) to be different from Y l, but penalize it Introduce a balance between labeled data fit and graph energy Graph Kernels by Spectral Transforms Smoothness, using eigenvector of Laplacian to keep smooth Use kernel alignment Gaussian field and Harmonic Function The label is descrete (Gaussian)

41 Reference [1] Zhu, Semi supervised learning tutorial ( [2] Zhu, Ghahramani Learning from labeled and unlabeled data [3]Zhu, Ghahramani, Lafferty Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions [4]Zhou at al Learning with Local and Global Consistency [5]Zhu et al Semi-supervised learning [6]Matt Stokes, Semi-Supervised Learning

Learning with Local and Global Consistency

Learning with Local and Global Consistency Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany

More information

Learning with Local and Global Consistency

Learning with Local and Global Consistency Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany

More information

Tutorial on Semi-Supervised Learning

Tutorial on Semi-Supervised Learning Tutorial on Semi-Supervised Learning Xiaojin Zhu Department of Computer Sciences University of Wisconsin, Madison, USA Theory and Practice of Computational Learning Chicago, 2009 Xiaojin Zhu (Univ. Wisconsin,

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

Ranking on Data Manifolds

Ranking on Data Manifolds Ranking on Data Manifolds Dengyong Zhou, Jason Weston, Arthur Gretton, Olivier Bousquet, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 72076 Tuebingen, Germany {firstname.secondname

More information

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Semi-Supervised Support Vector Machines and Application to Spam Filtering Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery

More information

Online Semi-Supervised Learning

Online Semi-Supervised Learning Online Semi-Supervised Learning Andrew B. Goldberg, Ming Li, Xiaojin Zhu jerryzhu@cs.wisc.edu Computer Sciences University of Wisconsin Madison Xiaojin Zhu (Univ. Wisconsin-Madison) Online Semi-Supervised

More information

Tree based ensemble models regularization by convex optimization

Tree based ensemble models regularization by convex optimization Tree based ensemble models regularization by convex optimization Bertrand Cornélusse, Pierre Geurts and Louis Wehenkel Department of Electrical Engineering and Computer Science University of Liège B-4000

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

MapReduce Approach to Collective Classification for Networks

MapReduce Approach to Collective Classification for Networks MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty

More information

Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

More information

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011

Introduction to Machine Learning. Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 Introduction to Machine Learning Speaker: Harry Chao Advisor: J.J. Ding Date: 1/27/2011 1 Outline 1. What is machine learning? 2. The basic of machine learning 3. Principles and effects of machine learning

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Supervised Feature Selection & Unsupervised Dimensionality Reduction Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

Graph-based Semi-supervised Learning

Graph-based Semi-supervised Learning Graph-based Semi-supervised Learning Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ MLSS 2012 La Palma Motivation Large

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of

More information

CSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye

CSE 494 CSE/CBS 598 (Fall 2007): Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye CSE 494 CSE/CBS 598 Fall 2007: Numerical Linear Algebra for Data Exploration Clustering Instructor: Jieping Ye 1 Introduction One important method for data compression and classification is to organize

More information

Convex Optimization SVM s and Kernel Machines

Convex Optimization SVM s and Kernel Machines Convex Optimization SVM s and Kernel Machines S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola and Stéphane Canu S.V.N.

More information

A Tutorial on Spectral Clustering

A Tutorial on Spectral Clustering A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Spemannstr. 38, 7276 Tübingen, Germany ulrike.luxburg@tuebingen.mpg.de This article appears in Statistics

More information

Neural Networks Lesson 5 - Cluster Analysis

Neural Networks Lesson 5 - Cluster Analysis Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm michele.scarpiniti@uniroma1.it Rome, 29

More information

MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

Notes on Symmetric Matrices

Notes on Symmetric Matrices CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.

More information

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

Introduction to Segmentation

Introduction to Segmentation Lecture 2: Introduction to Segmentation Jonathan Krause 1 Goal Goal: Identify groups of pixels that go together image credit: Steve Seitz, Kristen Grauman 2 Types of Segmentation Semantic Segmentation:

More information

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report. Document Clustering. Meryem Uzun-Per Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...

More information

Segmentation & Clustering

Segmentation & Clustering EECS 442 Computer vision Segmentation & Clustering Segmentation in human vision K-mean clustering Mean-shift Graph-cut Reading: Chapters 14 [FP] Some slides of this lectures are courtesy of prof F. Li,

More information

Factor analysis. Angela Montanari

Factor analysis. Angela Montanari Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

More information

Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu

Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel

More information

Lecture 3: Linear Programming Relaxations and Rounding

Lecture 3: Linear Programming Relaxations and Rounding Lecture 3: Linear Programming Relaxations and Rounding 1 Approximation Algorithms and Linear Relaxations For the time being, suppose we have a minimization problem. Many times, the problem at hand can

More information

Differential Privacy Preserving Spectral Graph Analysis

Differential Privacy Preserving Spectral Graph Analysis Differential Privacy Preserving Spectral Graph Analysis Yue Wang, Xintao Wu, and Leting Wu University of North Carolina at Charlotte, {ywang91, xwu, lwu8}@uncc.edu Abstract. In this paper, we focus on

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Nonlinear Optimization: Algorithms 3: Interior-point methods

Nonlinear Optimization: Algorithms 3: Interior-point methods Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,

More information

Variational approach to restore point-like and curve-like singularities in imaging

Variational approach to restore point-like and curve-like singularities in imaging Variational approach to restore point-like and curve-like singularities in imaging Daniele Graziani joint work with Gilles Aubert and Laure Blanc-Féraud Roma 12/06/2012 Daniele Graziani (Roma) 12/06/2012

More information

Geometrical Segmentation of Point Cloud Data using Spectral Clustering

Geometrical Segmentation of Point Cloud Data using Spectral Clustering Geometrical Segmentation of Point Cloud Data using Spectral Clustering Sergey Alexandrov and Rainer Herpers University of Applied Sciences Bonn-Rhein-Sieg 15th July 2014 1/29 Addressed Problem Given: a

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n Principles of Data Mining Pham Tho Hoan hoanpt@hnue.edu.vn References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

3. Interpolation. Closing the Gaps of Discretization... Beyond Polynomials

3. Interpolation. Closing the Gaps of Discretization... Beyond Polynomials 3. Interpolation Closing the Gaps of Discretization... Beyond Polynomials Closing the Gaps of Discretization... Beyond Polynomials, December 19, 2012 1 3.3. Polynomial Splines Idea of Polynomial Splines

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Prof. Alexander Ihler Prof. Max Welling icamp Tutorial July 22 What is machine learning? The ability of a machine to improve its performance based on previous results:

More information

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University

More information

Performance Metrics for Graph Mining Tasks

Performance Metrics for Graph Mining Tasks Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical

More information

Convolution. 1D Formula: 2D Formula: Example on the web: http://www.jhu.edu/~signals/convolve/

Convolution. 1D Formula: 2D Formula: Example on the web: http://www.jhu.edu/~signals/convolve/ Basic Filters (7) Convolution/correlation/Linear filtering Gaussian filters Smoothing and noise reduction First derivatives of Gaussian Second derivative of Gaussian: Laplacian Oriented Gaussian filters

More information

Predictive Modeling using Dimensionality Reduction and Dependency Structures

Predictive Modeling using Dimensionality Reduction and Dependency Structures Predictive Modeling using Dimensionality Reduction and Dependency Structures A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Amrudin Agovic IN PARTIAL FULFILLMENT

More information

Part 1: Link Analysis & Page Rank

Part 1: Link Analysis & Page Rank Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Exam on the 5th of February, 216, 14. to 16. If you wish to attend, please

More information

NETZCOPE - a tool to analyze and display complex R&D collaboration networks

NETZCOPE - a tool to analyze and display complex R&D collaboration networks The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE - a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method

Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method Lecture 3 3B1B Optimization Michaelmas 2015 A. Zisserman Linear Programming Extreme solutions Simplex method Interior point method Integer programming and relaxation The Optimization Tree Linear Programming

More information

Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 What is machine learning? Data description and interpretation

More information

A Learning Based Method for Super-Resolution of Low Resolution Images

A Learning Based Method for Super-Resolution of Low Resolution Images A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method

More information

Manifold Learning Examples PCA, LLE and ISOMAP

Manifold Learning Examples PCA, LLE and ISOMAP Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition

More information

Network Intrusion Detection using Semi Supervised Support Vector Machine

Network Intrusion Detection using Semi Supervised Support Vector Machine Network Intrusion Detection using Semi Supervised Support Vector Machine Jyoti Haweliya Department of Computer Engineering Institute of Engineering & Technology, Devi Ahilya University Indore, India ABSTRACT

More information

Unsupervised Data Mining (Clustering)

Unsupervised Data Mining (Clustering) Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

C19 Machine Learning

C19 Machine Learning C9 Machine Learning 8 Lectures Hilary Term 25 2 Tutorial Sheets A. Zisserman Overview: Supervised classification perceptron, support vector machine, loss functions, kernels, random forests, neural networks

More information

Quadratic Functions, Optimization, and Quadratic Forms

Quadratic Functions, Optimization, and Quadratic Forms Quadratic Functions, Optimization, and Quadratic Forms Robert M. Freund February, 2004 2004 Massachusetts Institute of echnology. 1 2 1 Quadratic Optimization A quadratic optimization problem is an optimization

More information

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of

More information

17.3.1 Follow the Perturbed Leader

17.3.1 Follow the Perturbed Leader CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.

More information

Solutions to Homework 6

Solutions to Homework 6 Solutions to Homework 6 Debasish Das EECS Department, Northwestern University ddas@northwestern.edu 1 Problem 5.24 We want to find light spanning trees with certain special properties. Given is one example

More information

Classification Problems

Classification Problems Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems

More information

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu

More information

Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

More information

Large-Scale Similarity and Distance Metric Learning

Large-Scale Similarity and Distance Metric Learning Large-Scale Similarity and Distance Metric Learning Aurélien Bellet Télécom ParisTech Joint work with K. Liu, Y. Shi and F. Sha (USC), S. Clémençon and I. Colin (Télécom ParisTech) Séminaire Criteo March

More information

Local classification and local likelihoods

Local classification and local likelihoods Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

A Network Flow Approach in Cloud Computing

A Network Flow Approach in Cloud Computing 1 A Network Flow Approach in Cloud Computing Soheil Feizi, Amy Zhang, Muriel Médard RLE at MIT Abstract In this paper, by using network flow principles, we propose algorithms to address various challenges

More information

10. Proximal point method

10. Proximal point method L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 7 of Data Mining by I. H. Witten and E. Frank

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 7 of Data Mining by I. H. Witten and E. Frank Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 7 of Data Mining by I. H. Witten and E. Frank Engineering the input and output Attribute selection Scheme independent, scheme

More information

Maximum Margin Clustering

Maximum Margin Clustering Maximum Margin Clustering Linli Xu James Neufeld Bryce Larson Dale Schuurmans University of Waterloo University of Alberta Abstract We propose a new method for clustering based on finding maximum margin

More information

Mathematics Course 111: Algebra I Part IV: Vector Spaces

Mathematics Course 111: Algebra I Part IV: Vector Spaces Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are

More information

1 Maximum likelihood estimation

1 Maximum likelihood estimation COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

More information

Vector and Matrix Norms

Vector and Matrix Norms Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty

More information

2.3 Scheduling jobs on identical parallel machines

2.3 Scheduling jobs on identical parallel machines 2.3 Scheduling jobs on identical parallel machines There are jobs to be processed, and there are identical machines (running in parallel) to which each job may be assigned Each job = 1,,, must be processed

More information

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations

Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions

More information

Seminar. Path planning using Voronoi diagrams and B-Splines. Stefano Martina stefano.martina@stud.unifi.it

Seminar. Path planning using Voronoi diagrams and B-Splines. Stefano Martina stefano.martina@stud.unifi.it Seminar Path planning using Voronoi diagrams and B-Splines Stefano Martina stefano.martina@stud.unifi.it 23 may 2016 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International

More information

Factorization Theorems

Factorization Theorems Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* Rakesh Nagi Department of Industrial Engineering University at Buffalo (SUNY) *Lecture notes from Network Flows by Ahuja, Magnanti

More information

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like

More information

Solutions to Linear Algebra Practice Problems

Solutions to Linear Algebra Practice Problems Solutions to Linear Algebra Practice Problems. Find all solutions to the following systems of linear equations. (a) x x + x 5 x x x + x + x 5 (b) x + x + x x + x + x x + x + 8x Answer: (a) We create the

More information

MATH 240 Fall, Chapter 1: Linear Equations and Matrices

MATH 240 Fall, Chapter 1: Linear Equations and Matrices MATH 240 Fall, 2007 Chapter Summaries for Kolman / Hill, Elementary Linear Algebra, 9th Ed. written by Prof. J. Beachy Sections 1.1 1.5, 2.1 2.3, 4.2 4.9, 3.1 3.5, 5.3 5.5, 6.1 6.3, 6.5, 7.1 7.3 DEFINITIONS

More information

Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

More information

Lecture 7: Approximation via Randomized Rounding

Lecture 7: Approximation via Randomized Rounding Lecture 7: Approximation via Randomized Rounding Often LPs return a fractional solution where the solution x, which is supposed to be in {0, } n, is in [0, ] n instead. There is a generic way of obtaining

More information

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing

CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing CSE / Notes : Task Scheduling & Load Balancing Task Scheduling A task is a (sequential) activity that uses a set of inputs to produce a set of outputs. A task (precedence) graph is an acyclic, directed

More information

Final Exam, Spring 2007

Final Exam, Spring 2007 10-701 Final Exam, Spring 2007 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you brought:

More information

Manifold regularized kernel logistic regression for web image annotation

Manifold regularized kernel logistic regression for web image annotation Manifold regularized kernel logistic regression for web image annotation W. Liu 1, H. Liu 1, D.Tao 2*, Y. Wang 1, K. Lu 3 1 China University of Petroleum (East China) 2 *** 3 University of Chinese Academy

More information

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications

More information

10-810 /02-710 Computational Genomics. Clustering expression data

10-810 /02-710 Computational Genomics. Clustering expression data 10-810 /02-710 Computational Genomics Clustering expression data What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally,

More information

Bayesian Active Distance Metric Learning

Bayesian Active Distance Metric Learning 44 YANG ET AL. Bayesian Active Distance Metric Learning Liu Yang and Rong Jin Dept. of Computer Science and Engineering Michigan State University East Lansing, MI 4884 Rahul Sukthankar Robotics Institute

More information