LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu
|
|
|
- Collin Skinner
- 10 years ago
- Views:
Transcription
1 LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu
2 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph Kernels by Spectral Transforms Gaussian field and Harmonic Function Reference
3 Semi Supervised Learning Semi-supervised learning is a class of supervised learning tasks and techniques that also make use of unlabeled data for training - typically a small amount of labeled data with a large amount of unlabeled data. Why Labeled data is hard to get Expensive, human annotation, time consuming May require experts Unlabeled data is cheap
4 Why unlabeled data helps[1] assuming each class is a coherent group (e.g. Gaussian) with and without unlabeled data: decision boundary shift
5 Label Propagation[2] Assumption Closer data points tend to have similar class labels. General Idea A node s labels propagate to neighboring nodes according to their proximity Clamp the labels on the labeled data, so the labeled data could act like a sources that push out labels to unlabeled data.
6 Set up Input x, label y Labeled data (x 1,y 1 ),(x 2, y 2 ) (x l,y l ) Unlabeled data (x l+1,y l+1 ) (x l+u,y l+u ) l<<u weight
7 Probabilistic Transition Matrix Allow larger edge weight to propagate labels easier w ij T ij = P j i = l+u k=1 w kj T ij is the probability to jump from node j to I Normalized T For example: The probability node 2 jump to Node 3 is 0.7 T ij = T ij k T ik The probability node 3 jump to Node 1 is 0.5
8 Matrix Y Define (l+u) * C label matrix Y, whose ith row representing label probability distribution of node x i Y ij =1, if the class of x i is c j, else 0, for labeled data The initialization of row of Y corresponding to unlabeled data is not important Node 1 is labeled as label 2. The label distribution of node 3. For example, 0.7 is the probability that node 3 is label 1.
9 Algorithm 1 Propagate Y TY Labels spread information along local structure 2 Row normalize Y Keep proper distribution over classes 3 Clamp the labeled data, Repeat from step 1 until Y converges Keep originally labeled points
10 Convergence The first two steps Split T Yu General from T is row normalized, T uu is submatrix of T
11 Convergence(cont) Consider the row sum No need to iterate!
12 Parameter Setting How to choose parameter σ First, use a heuristic method. Finding a minimum spanning tree over all data points with Euclidean distances d ij with Kruskal s Algorithm(The famous greedy algorithm in data structure). Choose the first tree edge that connect two components with different labeled points. The length is d 0. Set σ=d 0 /3
13 The effect of σ
14 Optimizing σ Single parameter σ controls spread of labels For σ 0, classification of unlabeled points dominated by nearest labeled point For σ, class probabilities just become class frequencies (no information from label proximity) Can minimize entropy of class labels H=- ij Y ij logy ij Leads to confident classifications However, minimum entropy at σ=0
15 Optimizing σ(cont) Add uniform transition component (U ij =1/N) to T ~ T U (1 )T For small σ, uniform component dominates Minimum entropy no longer at σ=0 Use σ 1 σ N to scale each dimension independently Perform gradient descent with respect to σ s in order to minimize entropy H d H Y L U C ic il1 c1 Yic d
16 Rebalancing Class Proportions How should we assign classes to unlabeled points? Could choose most likely class ML method does not explicitly control class proportions Suppose we want labels to fit a known or estimated distribution over classes Normalize class mass scale columns of Y U to fit class distribution and then pick ML class Does not guarantee strict label proportions Perform label bidding each entry Y U (i,c) is a bid of sample i for class c Handle bids from largest to smallest Bid is taken if class c is not full, otherwise it is discarded
17 Experiment result 3 bands dataset and Spring dataset
18 Learning with local and global consistency[4] The key to semi- supervised learning Nearby points are likely to have the same label Points on the same structure (cluster or manifold) are likely to have the same label
19 A toy example
20 Objective Design a classifying function which is sufficiently smooth with respect to the intrinsic structure
21 Algorithm Receive information from its neighbour Retain Initial information
22 Convergence The sequence {F(t)} converges, suppose F(0)=Y F*=(1-α)(I-αS) -1 Y The proof is similar to Label Propagation
23 Regularization Framework(A different pespective) Q(F)= 1 2 n i,j=1 W ij F i D ii F j n μ F D i=1 i Y i jj Smoothness term, capture the local variations, a good function should not change too much between nearby points Fitting constraints, loss function, a good classifying function should not change too much from initial label assignment
24 Regularization Framework Property of Laplacian Matrix 1 2 Differentiating Q(F) with respect to F Q n i,j=1 W ij F i D ii F j D jj 2 = F SF + μ F Y F F=F = 0 F 1 1+μ SF μ 1+μ α = 1, and β = μ 1+μ 1+μ I αs F = βy I αs is invertible, F = β(i αs) 1 Y = f T D 1/2 L sym D 1/2 f L sym = D 1/2 LD 1/2 =I-D 1/2 WD 1/2 =I-S
25 Experiment
26 Graph Kernels by Spectral Transforms[5] Graph-based semi-supervised learning methods can be viewed as imposing smoothness conditions on the target function Eigenvectors with small eigenvalues are smooth, and ideally represent large cluster structures within the data.
27 Smoothness Consider the Laplacian L
28 Smoothness Semi-supervised learning creates a smooth function over unlabeled points f T Lf 1 2 N i, j1 W ij ( f ( i) f ( j)) 2 Generally, smooth if f(i) f(j) for pairs with large W ij The smoothness of an eigenvector is Eigenvectors with smaller eigenvalues are smoother
29 Smoothness of Eigenvectors The complete orthonormal set of eigenvectors ø 1, n ø 2.. ø n L = λi i=1 φ i φ i T
30 Kernels by Spectral Transform Different weightings (i.e. spectral transforms) of Laplacian eigenvalues leads to different smoothness measures We want a kernel K that respects smoothness Define using eigenvectors of Laplacian (φ) and eigenvalues of K (μ) K N i1 Can also define in terms of a spectral transform of Laplacian eigenvalues K N i1 i i i i T i r( ) T i
31 Types of Transforms r(λ i ) is a non-negative and decreasing transform Regularize d Laplacian Diffusion Kernel 1- step Random Walk p - step Random Walk Inverse Cosine Step Function 1 r( ) 2 r( ) exp 2 r( ) ( ), 2 r( ) r( ) r( ) p ( ), 2 cos( / 4) 1if Reverses order of eigenvalues, so smooth eigenvectors have larger eigenvalues in K Is there an optimal transform? cut We need to find a regularizer here
32 Kernel Alignment Assess fitness of a kernel to training labels Empirical kernel alignment compares kernel matrix K tr for training data to target matrix T for training data T ij =1 if y i =y j, otherwise T ij =-1 Aˆ ( K tr, T) K tr K tr, K tr, T F F T, T M, N Tr( MN) F Alignment measure computes cosine between K tr and T Find the optimal spectral transformation r(λ i ) using the kernel alignment notion F Frobenius Product
33 Convex Optimization Convex set Convex function Convex Optimization Linear Programming Minimize c T x+d Subject to Gx h Ax=b Quadratically Constrained Quadratic Programming Minimize 1/2x T Px+c T x+d Subject to 1/2x T Q i X+r it x+s i =0 Ax=b
34 QCQP Kernel alignment between K tr and T is a convex function of kernel eigenvalues μ i No assumption on parametric form of transform r(λ i ) Need K to be positive semi-definite Restrict eigenvalues of K to be 0 Leads to computationally efficient Quadratically Constrained Quadratic Program Minimize convex quadratic function over smaller feasible region Both objective function and constraints are quadratic Complexity comparable to linear programs
35 Impose Order Constraints We would like to keep decreasing order on spectral transformation Smooth functions are preferred bigger eigenvalues for smoother eigenvectors An order constrained semi-supervised kernel K is the solution to the following convex optimization problem.
36 Improved Order Constraints Constant eigenvectors act as a bias term in the graph kernel λ 1 =0, corresponding eigenvector φ i is constant Need not constrain bias terms Improved Order constrains Ignore the constant eigenvectors μ i μ i+1, i = 1.. n 1, and i not constant
37 Harmonic Functions (Zhu, 2003)[3] Now define class labeling f in terms of a Gaussian over continuous space, instead of random field over discrete label set Distribution on f is a Gaussian field p Z E( f ) e ( f ) Z exp( E( f )) df f L f l Useful for multi-label problems (NP-hard for discrete random fields) ML configuration is now unique, attainable by matrix methods, and characterized by harmonic functions
38 Harmonic Energy Energy of solution labeling f is defined as: 1 E( f ) wij ( f ( i) f ( j)) 2 i, j Nearby points should have similar labels Solution which minimizes E(f) is harmonic Δf=0 for unlabeled points, where Δ=D-W (combinatorial Laplacian) Δf=f l for labeled points Value of f at an unlabeled point is the average of f at neighboring points 1 f ( j) wij f ( i), for j L 1,, L U d j i~ j 1 f D Wf 2
39 Harmonic Solution As before, split problem into: f f f Solve using Δf=0, f L = f l : f u ( D uu W W uu ) 1 W Can be viewed as heat kernel classification, but independent of time parameter ul f l W l ll lu 1 u W W ul W uu ( I P uu ) 1 P ul P f l D W
40 Summary Label Propagation Propagate and clamp data Local and global consistency Allow f(x l ) to be different from Y l, but penalize it Introduce a balance between labeled data fit and graph energy Graph Kernels by Spectral Transforms Smoothness, using eigenvector of Laplacian to keep smooth Use kernel alignment Gaussian field and Harmonic Function The label is descrete (Gaussian)
41 Reference [1] Zhu, Semi supervised learning tutorial ( [2] Zhu, Ghahramani Learning from labeled and unlabeled data [3]Zhu, Ghahramani, Lafferty Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions [4]Zhou at al Learning with Local and Global Consistency [5]Zhu et al Semi-supervised learning [6]Matt Stokes, Semi-Supervised Learning
Learning with Local and Global Consistency
Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany
Learning with Local and Global Consistency
Learning with Local and Global Consistency Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf Max Planck Institute for Biological Cybernetics, 7276 Tuebingen, Germany
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
Semi-Supervised Support Vector Machines and Application to Spam Filtering
Semi-Supervised Support Vector Machines and Application to Spam Filtering Alexander Zien Empirical Inference Department, Bernhard Schölkopf Max Planck Institute for Biological Cybernetics ECML 2006 Discovery
Online Semi-Supervised Learning
Online Semi-Supervised Learning Andrew B. Goldberg, Ming Li, Xiaojin Zhu [email protected] Computer Sciences University of Wisconsin Madison Xiaojin Zhu (Univ. Wisconsin-Madison) Online Semi-Supervised
Tree based ensemble models regularization by convex optimization
Tree based ensemble models regularization by convex optimization Bertrand Cornélusse, Pierre Geurts and Louis Wehenkel Department of Electrical Engineering and Computer Science University of Liège B-4000
Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
MapReduce Approach to Collective Classification for Networks
MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty
Statistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]
Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
Supervised Feature Selection & Unsupervised Dimensionality Reduction
Supervised Feature Selection & Unsupervised Dimensionality Reduction Feature Subset Selection Supervised: class labels are given Select a subset of the problem features Why? Redundant features much or
Part 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
Graph-based Semi-supervised Learning
Graph-based Semi-supervised Learning Zoubin Ghahramani Department of Engineering University of Cambridge, UK [email protected] http://learning.eng.cam.ac.uk/zoubin/ MLSS 2012 La Palma Motivation Large
Approximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Big Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
Big Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of
MACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
A Tutorial on Spectral Clustering
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Spemannstr. 38, 7276 Tübingen, Germany [email protected] This article appears in Statistics
2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
Notes on Symmetric Matrices
CPSC 536N: Randomized Algorithms 2011-12 Term 2 Notes on Symmetric Matrices Prof. Nick Harvey University of British Columbia 1 Symmetric Matrices We review some basic results concerning symmetric matrices.
THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok
THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan
Machine Learning and Pattern Recognition Logistic Regression
Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
Neural Networks Lesson 5 - Cluster Analysis
Neural Networks Lesson 5 - Cluster Analysis Prof. Michele Scarpiniti INFOCOM Dpt. - Sapienza University of Rome http://ispac.ing.uniroma1.it/scarpiniti/index.htm [email protected] Rome, 29
Segmentation & Clustering
EECS 442 Computer vision Segmentation & Clustering Segmentation in human vision K-mean clustering Mean-shift Graph-cut Reading: Chapters 14 [FP] Some slides of this lectures are courtesy of prof F. Li,
Factor analysis. Angela Montanari
Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number
Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. [email protected]
Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang [email protected] http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel
Variational approach to restore point-like and curve-like singularities in imaging
Variational approach to restore point-like and curve-like singularities in imaging Daniele Graziani joint work with Gilles Aubert and Laure Blanc-Féraud Roma 12/06/2012 Daniele Graziani (Roma) 12/06/2012
Nonlinear Optimization: Algorithms 3: Interior-point methods
Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris [email protected] Nonlinear optimization c 2006 Jean-Philippe Vert,
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
An Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia [email protected] Tata Institute, Pune,
Big Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
Principles of Dat Da a t Mining Pham Tho Hoan [email protected] [email protected]. n
Principles of Data Mining Pham Tho Hoan [email protected] References [1] David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining, MIT press, 2002 [2] Jiawei Han and Micheline Kamber,
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
3. Interpolation. Closing the Gaps of Discretization... Beyond Polynomials
3. Interpolation Closing the Gaps of Discretization... Beyond Polynomials Closing the Gaps of Discretization... Beyond Polynomials, December 19, 2012 1 3.3. Polynomial Splines Idea of Polynomial Splines
NETZCOPE - a tool to analyze and display complex R&D collaboration networks
The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE - a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.
Performance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
Part 1: Link Analysis & Page Rank
Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Exam on the 5th of February, 216, 14. to 16. If you wish to attend, please
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
Lecture 3. Linear Programming. 3B1B Optimization Michaelmas 2015 A. Zisserman. Extreme solutions. Simplex method. Interior point method
Lecture 3 3B1B Optimization Michaelmas 2015 A. Zisserman Linear Programming Extreme solutions Simplex method Interior point method Integer programming and relaxation The Optimization Tree Linear Programming
Convolution. 1D Formula: 2D Formula: Example on the web: http://www.jhu.edu/~signals/convolve/
Basic Filters (7) Convolution/correlation/Linear filtering Gaussian filters Smoothing and noise reduction First derivatives of Gaussian Second derivative of Gaussian: Laplacian Oriented Gaussian filters
Environmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
A Network Flow Approach in Cloud Computing
1 A Network Flow Approach in Cloud Computing Soheil Feizi, Amy Zhang, Muriel Médard RLE at MIT Abstract In this paper, by using network flow principles, we propose algorithms to address various challenges
A Learning Based Method for Super-Resolution of Low Resolution Images
A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 [email protected] Abstract The main objective of this project is the study of a learning based method
Manifold Learning Examples PCA, LLE and ISOMAP
Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Network Intrusion Detection using Semi Supervised Support Vector Machine
Network Intrusion Detection using Semi Supervised Support Vector Machine Jyoti Haweliya Department of Computer Engineering Institute of Engineering & Technology, Devi Ahilya University Indore, India ABSTRACT
Classification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA [email protected]
Unsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of
17.3.1 Follow the Perturbed Leader
CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.
Solutions to Homework 6
Solutions to Homework 6 Debasish Das EECS Department, Northwestern University [email protected] 1 Problem 5.24 We want to find light spanning trees with certain special properties. Given is one example
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
Cluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
Large-Scale Similarity and Distance Metric Learning
Large-Scale Similarity and Distance Metric Learning Aurélien Bellet Télécom ParisTech Joint work with K. Liu, Y. Shi and F. Sha (USC), S. Clémençon and I. Colin (Télécom ParisTech) Séminaire Criteo March
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
Local classification and local likelihoods
Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor
Maximum Margin Clustering
Maximum Margin Clustering Linli Xu James Neufeld Bryce Larson Dale Schuurmans University of Waterloo University of Alberta Abstract We propose a new method for clustering based on finding maximum margin
Mathematics Course 111: Algebra I Part IV: Vector Spaces
Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are
10. Proximal point method
L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing
Geometric-Guided Label Propagation for Moving Object Detection
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Geometric-Guided Label Propagation for Moving Object Detection Kao, J.-Y.; Tian, D.; Mansour, H.; Ortega, A.; Vetro, A. TR2016-005 March 2016
Binary Image Reconstruction
A network flow algorithm for reconstructing binary images from discrete X-rays Kees Joost Batenburg Leiden University and CWI, The Netherlands [email protected] Abstract We present a new algorithm
Factorization Theorems
Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization
1 Maximum likelihood estimation
COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N
Vector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty
Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations
Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions
Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs
CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like
Component Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*
IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* Rakesh Nagi Department of Industrial Engineering University at Buffalo (SUNY) *Lecture notes from Network Flows by Ahuja, Magnanti
Seminar. Path planning using Voronoi diagrams and B-Splines. Stefano Martina [email protected]
Seminar Path planning using Voronoi diagrams and B-Splines Stefano Martina [email protected] 23 may 2016 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International
CSE 4351/5351 Notes 7: Task Scheduling & Load Balancing
CSE / Notes : Task Scheduling & Load Balancing Task Scheduling A task is a (sequential) activity that uses a set of inputs to produce a set of outputs. A task (precedence) graph is an acyclic, directed
Bayesian Active Distance Metric Learning
44 YANG ET AL. Bayesian Active Distance Metric Learning Liu Yang and Rong Jin Dept. of Computer Science and Engineering Michigan State University East Lansing, MI 4884 Rahul Sukthankar Robotics Institute
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)
ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications
Lecture 2 Linear functions and examples
EE263 Autumn 2007-08 Stephen Boyd Lecture 2 Linear functions and examples linear equations and functions engineering examples interpretations 2 1 Linear equations consider system of linear equations y
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).
1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
Lecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
Least-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
Machine Learning in Computer Vision A Tutorial. Ajay Joshi, Anoop Cherian and Ravishankar Shivalingam Dept. of Computer Science, UMN
Machine Learning in Computer Vision A Tutorial Ajay Joshi, Anoop Cherian and Ravishankar Shivalingam Dept. of Computer Science, UMN Outline Introduction Supervised Learning Unsupervised Learning Semi-Supervised
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/2004 Hierarchical
Introduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
Distributed Machine Learning and Big Data
Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya
Discrete Optimization
Discrete Optimization [Chen, Batson, Dang: Applied integer Programming] Chapter 3 and 4.1-4.3 by Johan Högdahl and Victoria Svedberg Seminar 2, 2015-03-31 Todays presentation Chapter 3 Transforms using
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Lecture 10: Regression Trees
Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,
Linear Programming. March 14, 2014
Linear Programming March 1, 01 Parts of this introduction to linear programming were adapted from Chapter 9 of Introduction to Algorithms, Second Edition, by Cormen, Leiserson, Rivest and Stein [1]. 1
Applied Algorithm Design Lecture 5
Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design
