ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING) Gabriela Ochoa http://www.cs.stir.ac.uk/~goc/ OUTLINE Preliminaries Classification and Clustering Applications of clustering Review of relevant concepts Euclidean distance Voronoi diagrams K-Means algorithm Description and Demo K-Means as an optimisation problem A real-world example Summary & what s next? 2 1
PRELIMINARIES Machine learning helps us navigate and process large volumes of data Examples of questions about our data What is this data point most similar to? Does the data come in patterns? Can we predict what will happen in the future, given past trends? 3 MACHINE LEARNING TASKS Classification Clustering Supervised learning Make a prediction given evidence We studied Decision Trees, but there are many other methods Useful when you have labeled data Unsupervised learning Detect patterns in unlabelled data. Examples: group emails or search results find categories of customers detect anomalous program executions Useful when don t know what you re looking for Requires data, but no labels 4 2
APPLICATIONS OF CLUSTERING Economy (market research) Discover distinct groups in their customer bases, use this knowledge to develop targeted marketing programs. Internet and WWW Document classification Cluster Weblog data to discover groups of similar access patterns Pattern recognition Blind signal separation: imaging a recording of two voices with a microphone. Task to separate the two voices into separate signals Image processing Astronomy - aggregation of stars, galaxies, or supergalaxies Medicine separating healthy from diseased tissue 5 CLUSTERING Basic idea: group together similar instances Example: 2D point patterns What could similar mean? One option: small (squared) Euclidean distance 3
DISTANCE BETWEEN POINTS The distance between two points is the length of the path connecting them In the plane, the distance between points (x 1, y 1 ) and (x 2, y 2 ) is given by the Pythagoras Theorem: In Euclidean three-space, the distance between (x 1, y 1, z 1 ) and (x 2, y 2, z 2 ) is: In general for n dimensions, the distance between x and y is: 7 VORONOI DIAGRAM The partitioning of a plane with n points into convex polygons Each polygon contains exactly one generating point Every point in a given polygon is closer to its generating point than to any other. Image fro Wikipedia Weisstein, Eric W. "Voronoi Diagram." From MathWorld http://mathworld.wolfram.com/voronoidiagram.html 8 4
K-MEANS Javascript K-Means Demo An iterative clustering algorithm 1. Plot data points 2. Create K additional points, placing them randomly. This points are the cluster centroids 3. Repeat: Assign each data point to the cluster centroid closest to it Move the centroid to the average position of all the data points that belong to If any of the centroids moved repeat else exit. K-MEANS EXAMPLE 5
K-MEANS AS OPTIMISATION Consider the total distance to the means: points means assignments Each iteration reduces phi Two stages each iteration: Update assignments: fix means c, change assignments a Update means: fix assignments a, change means c PHASE I: UPDATE ASSIGNMENTS For each point, reassign to closest mean: Can only decrease total distance phi! 6
PHASE II: UPDATE MEANS Move each mean to the average of its assigned points: Also can only decrease total distance Fun fact: the point y with minimum squared Euclidean distance to a set of points {x} is their mean INITIALISATION K-means is nondeterministic Requires initial means It does matter what you pick! What can go wrong? Various schemes for preventing this kind of thing: Multiple restarts Variance-based split / merge, Initialization heuristics 7
K-MEANS GETTING STUCK A local optimum: K-Means has some drawbacks Several methods have been proposed to overcome them Still very much used in practice Main limitation, need to suggest number of clusters K in advance Why doesn t this work out like the earlier example, with the purple taking over half the blue? EXAMPLE: GOOGLE NEWS Top-level categories: supervised classification Story groupings: unsupervised clustering 16 8
SUMMARY: MAIN MACHINE LEARNING TASKS Supervised learning Unsupervised learning Inferring a function from labeled training data The training data consist of a set of training examples Each example is a pair has an input vector and a desired output value Algorithms: Neural networks Bayesian methods Kernel estimators Nearest neighbor, etc. Trying to find hidden structure in unlabelled data Requires data, but no labels Examples are unlabeled, there is no error or reward signal to evaluate a potential solution Algorithms: Clustering: (k-means, mixture models, hierarchical clustering Expectation-maximisation algorithm PCA, self-organised maps, etc. 17 9