Minimum Spanning Trees



Similar documents
Social Media Mining. Graph Essentials

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

Union-Find Algorithms. network connectivity quick find quick union improvements applications

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

CIS 700: algorithms for Big Data

Chapter Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

Applied Algorithm Design Lecture 5

Approximation Algorithms

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

Distributed Computing over Communication Networks: Maximal Independent Set

Binary Heap Algorithms

Analysis of Algorithms, I

Data Structures and Algorithms Written Examination

Directed Graphs. digraph search transitive closure topological sort strong components. References: Algorithms in Java, Chapter 19

Algorithm Design and Analysis

Scheduling Shop Scheduling. Tim Nieberg

Euclidean Minimum Spanning Trees Based on Well Separated Pair Decompositions Chaojun Li. Advised by: Dave Mount. May 22, 2014

dictionary find definition word definition book index find relevant pages term list of page numbers

Protein Protein Interaction Networks

Data Structures Fibonacci Heaps, Amortized Analysis

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

CS711008Z Algorithm Design and Analysis

Dynamic programming. Doctoral course Optimization on graphs - Lecture 4.1. Giovanni Righini. January 17 th, 2013

SIMS 255 Foundations of Software Design. Complexity and NP-completeness

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

Solutions to Homework 6

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

GRAPH THEORY LECTURE 4: TREES

cs2010: algorithms and data structures

UPPER BOUNDS ON THE L(2, 1)-LABELING NUMBER OF GRAPHS WITH MAXIMUM DEGREE

Exam study sheet for CS2711. List of topics

Network (Tree) Topology Inference Based on Prüfer Sequence

SCAN: A Structural Clustering Algorithm for Networks

Finding and counting given length cycles

Network Algorithms for Homeland Security

Exponential time algorithms for graph coloring

ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015

B AB 5 C AC 3 D ABGED 9 E ABGE 7 F ABGEF 8 G ABG 6 A BEDA 3 C BC 1 D BCD 2 E BE 1 F BEF 2 G BG 1

Graphical degree sequences and realizations

CMPSCI611: Approximating MAX-CUT Lecture 20

Well-Separated Pair Decomposition for the Unit-disk Graph Metric and its Applications

arxiv: v1 [cs.dc] 7 Dec 2014

Part 2: Community Detection

MATHEMATICS Unit Decision 1

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Computational Geometry. Lecture 1: Introduction and Convex Hulls

Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit

THE PROBLEM WORMS (1) WORMS (2) THE PROBLEM OF WORM PROPAGATION/PREVENTION THE MINIMUM VERTEX COVER PROBLEM

11. APPROXIMATION ALGORITHMS

Chapter 6: Graph Theory

5.4 Closest Pair of Points

1 Definitions. Supplementary Material for: Digraphs. Concept graphs

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Page 1. CSCE 310J Data Structures & Algorithms. CSCE 310J Data Structures & Algorithms. P, NP, and NP-Complete. Polynomial-Time Algorithms

Social Media Mining. Data Mining Essentials

Simplified External memory Algorithms for Planar DAGs. July 2004

API for java.util.iterator. ! hasnext() Are there more items in the list? ! next() Return the next item in the list.

CompSci-61B, Data Structures Final Exam

International Journal of Software and Web Sciences (IJSWS)

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis

Max Flow, Min Cut, and Matchings (Solution)

Tables so far. set() get() delete() BST Average O(lg n) O(lg n) O(lg n) Worst O(n) O(n) O(n) RB Tree Average O(lg n) O(lg n) O(lg n)

All trees contain a large induced subgraph having all degrees 1 (mod k)

Persistent Data Structures and Planar Point Location

Cpt S 223. School of EECS, WSU

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case.

Algorithms. Algorithms GEOMETRIC APPLICATIONS OF BSTS. 1d range search line segment intersection kd trees interval search trees rectangle intersection

Bicolored Shortest Paths in Graphs with Applications to Network Overlay Design

Small Maximal Independent Sets and Faster Exact Graph Coloring

OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION

On the k-path cover problem for cacti

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

Outline 2.1 Graph Isomorphism 2.2 Automorphisms and Symmetry 2.3 Subgraphs, part 1

Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs

On the independence number of graphs with maximum degree 3

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

8.1 Min Degree Spanning Tree

136 CHAPTER 4. INDUCTION, GRAPHS AND TREES

Guessing Game: NP-Complete?

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

5.1 Bipartite Matching

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.

The Goldberg Rao Algorithm for the Maximum Flow Problem

A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property

Questions 1 through 25 are worth 2 points each. Choose one best answer for each.

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

On Integer Additive Set-Indexers of Graphs

CSC 373: Algorithm Design and Analysis Lecture 16

Load Balancing. Load Balancing 1 / 24

AI: A Modern Approach, Chpts. 3-4 Russell and Norvig

A Fast Algorithm For Finding Hamilton Cycles

COMBINATORIAL PROPERTIES OF THE HIGMAN-SIMS GRAPH. 1. Introduction

V. Adamchik 1. Graph Theory. Victor Adamchik. Fall of 2005

Link Prediction in Social Networks

Chapter 20: Data Analysis

Transcription:

Minimum Spanning Trees weighted graph API cycles and cuts Kruskal s algorithm Prim s algorithm advanced topics References: Algorithms in Java, Chapter 20 http://www.cs.princeton.edu/introalgsds/54mst 1

Minimum Spanning Tree Given. Undirected graph G with positive edge weights (connected). Goal. Find a min weight set of edges that connects all of the vertices. 4 24 6 23 18 9 16 8 5 10 11 14 7 21 G 2

Minimum Spanning Tree Given. Undirected graph G with positive edge weights (connected). Goal. Find a min weight set of edges that connects all of the vertices. 4 24 6 23 18 9 16 8 5 10 11 14 7 21 weight(t) = 50 = 4 + 6 + 8 + 5 + 11 + 9 + 7 Brute force: Try all possible spanning trees problem 1: not so easy to implement problem 2: far too many of them Ex: [Cayley, 1889]: V V-2 spanning trees on the complete graph on V vertices. 3

MST Origin Otakar Boruvka (1926). Electrical Power Company of Western Moravia in Brno. Most economical construction of electrical power network. Concrete engineering problem is now a cornerstone problem-solving model in combinatorial optimization. Otakar Boruvka 4

Applications MST is fundamental problem with diverse applications. Network design. telephone, electrical, hydraulic, TV cable, computer, road Approximation algorithms for NP-hard problems. traveling salesperson problem, Steiner tree Indirect applications. max bottleneck paths LDPC codes for error correction image registration with Renyi entropy learning salient features for real-time face verification reducing data storage in sequencing amino acids in a protein model locality of particle interactions in turbulent fluid flows autoconfig protocol for Ethernet bridging to avoid cycles in a network Cluster analysis. 5

Medical Image Processing MST describes arrangement of nuclei in the epithelium for cancer research http://www.bccrc.ca/ci/ta01_archlevel.html 6

http://ginger.indstate.edu/ge/gfx 7

Two Greedy Algorithms Kruskal's algorithm. Consider edges in ascending order of cost. Add the next edge to T unless doing so would create a cycle. Prim's algorithm. Start with any vertex s and greedily grow a tree T from s. At each step, add the cheapest edge to T that has exactly one endpoint in T. Proposition. Both greedy algorithms compute an MST. Greed is good. Greed is right. Greed works. Greed clarifies, cuts through, and captures the essence of the evolutionary spirit." - Gordon Gecko 8

weighted graph API cycles and cuts Kruskal s algorithm Prim s algorithm advanced topics 9

Weighted Graph API public class WeightedGraph void Iterable<Edge> int String WeightedGraph(int V) insert(edge e) adj(int v) V() tostring() create an empty graph with V vertices insert edge e return an iterator over edges incident to v return the number of vertices return a string representation iterate through all edges (once in each direction) 10

Weighted graph data type Identical to Graph.java but use Edge adjacency sets instead of int. public class WeightedGraph { private int V; private SET<Edge>[] adj; } public Graph(int V) { this.v = V; adj = (SET<Edge>[]) new SET[V]; for (int v = 0; v < V; v++) adj[v] = new SET<Edge>(); } public void addedge(edge e) { int v = e.v, w = e.w; adj[v].add(e); adj[w].add(e); } public Iterable<Edge> adj(int v) { return adj[v]; } 11

Weighted edge data type public class Edge implements Comparable<Edge> { private final int v, int w; private final double weight; Edge abstraction needed for weights public Edge(int v, int w, double weight) { this.v = v; this.w = w; this.weight = weight; } public int either() { return v; } public int other(int vertex) { if (vertex == v) return w; else return v; } public int weight() { return weight; } // See next slide for edge compare methods. slightly tricky accessor methods (enables client code like this) for (int v = 0; v < G.V(); v++) { for (Edge e : G.adj(v)) { int w = e.other(v); } } // edge v-w } 12

Weighted edge data type: compare methods Two different compare methods for edges compareto() so that edges are Comparable (for use in SET) compare() so that clients can compare edges by weight. public final static Comparator<Edge> BY_WEIGHT = new ByWeightComparator(); private static class ByWeightComparator implements Comparator<Edge> { public int compare(edge e, Edge f) { if (e.weight < f.weight) return -1; if (e.weight > f.weight) return +1; return 0; } } } public int compareto(edge that) { if (this.weight < that.weight) return -1; else if (this.weight > that.weight) return +1; else if (this.weight > that.weight) return 0; } 13

weighted graph API cycles and cuts Kruskal s algorithm Prim s algorithm advanced topics 14

Spanning Tree MST. Given connected graph G with positive edge weights, find a min weight set of edges that connects all of the vertices. Def. A spanning tree of a graph G is a subgraph T that is connected and acyclic. Property. MST of G is always a spanning tree. 15

Greedy Algorithms Simplifying assumption. All edge weights w e are distinct. Cycle property. Let C be any cycle, and let f be the max cost edge belonging to C. Then the MST does not contain f. Cut property. Let S be any subset of vertices, and let e be the min cost edge with exactly one endpoint in S. Then the MST contains e. f C S e f is not in the MST e is in the MST 16

Cycle Property Simplifying assumption. All edge weights w e are distinct. Cycle property. Let C be any cycle, and let f be the max cost edge belonging to C. Then the MST T* does not contain f. Pf. [by contradiction] Suppose f belongs to T*. Let's see what happens. Deleting f from T* disconnects T*. Let S be one side of the cut. Some other edge in C, say e, has exactly one endpoint in S. T = T* { e } { f } is also a spanning tree. Since c e < c f, cost(t) < cost(t*). Contradicts minimality of T*. f C S e T* 17

Cut Property Simplifying assumption. All edge costs c e are distinct. Cut property. Let S be any subset of vertices, and let e be the min cost edge with exactly one endpoint in S. Then the MST T* contains e. Pf. [by contradiction] Suppose e does not belong to T*. Let's see what happens. Adding e to T* creates a (unique) cycle C in T*. T = T* { e } { f } is also a spanning tree. Some other edge in C, say f, has exactly one endpoint in S. Since c e < c f, cost(t) < cost(t*). Contradicts minimality of T*. f cycle C S e MST T* 18

weighted graph API cycles and cuts Kruskal s algorithm Prim s algorithm advanced algorithms clustering 19

Kruskal's Algorithm: Example Kruskal's algorithm. [Kruskal, 1956] Consider edges in ascending order of cost. Add the next edge to T unless doing so would create a cycle. 3-5 1-7 6-7 3-5 0.18 1-7 0.21 6-7 0.25 0-2 0.29 0-7 0.31 0-1 0.32 3-4 0.34 4-5 0.40 4-7 0.46 0-6 0.51 4-6 0.51 0-5 0.60 0-2 0-7 0-1 3-4 4-5 4-7 20

Kruskal's algorithm example 25% 75% 50% 100% 21

Kruskal's algorithm correctness proof Proposition. Kruskal's algorithm computes the MST. Pf. [case 1] Suppose that adding e to T creates a cycle C e is not in the MST (cycle property) e is the max weight edge in C (weights come in increasing order) v e C w 22

Kruskal's algorithm correctness proof Proposition. Kruskal's algorithm computes the MST. Pf. [case 2] Suppose that adding e = (v, w) to T does not create a cycle let S be the vertices in v s connected component w is not in S e is the min weight edge with exactly one endpoint in S e is in the MST (cut property) w e S v 23

Kruskal's algorithm implementation Q. How to check if adding an edge to T would create a cycle? A1. Naïve solution: use DFS. O(V) time per cycle check. O(E V) time overall. 24

Kruskal's algorithm implementation Q. How to check if adding an edge to T would create a cycle? A2. Use the union-find data structure from lecture 1 (!). Maintain a set for each connected component. To add v-w to T, merge sets containing v and w. If v and w are in same component, then adding v-w creates a cycle. w v w v Case 1: adding v-w creates a cycle Case 2: add v-w to T and merge sets 25

Kruskal's algorithm: Java implementation public class Kruskal { private SET<Edge> mst = new SET<Edge>(); public Kruskal(WeightedGraph G) { Edge[] edges = G.edges(); Arrays.sort(edges, Edge.BY_WEIGHT); UnionFind uf = new UnionFind(G.V()); for (Edge e: edges) if (!uf.find(e.either(), e.other())) { uf.unite(e.either(), e.other()); mst.add(edge); } sort edges by weight greedily add edges to MST } } public Iterable<Edge> mst() { return mst; } return to client iterable sequence of edges Easy speedup: Stop as soon as there are V-1 edges in MST. 26

Kruskal's algorithm running time Kruskal running time: Dominated by the cost of the sort. Operation sort union find Frequency 1 V E Time per op E log E log* V log* V amortized bound using weighted quick union with path compression recall: log* V 5 in this universe Remark 1. If edges are already sorted, time is proportional to E log* V Remark 2. Linear in practice with PQ or quicksort partitioning (see book: don t need full sort) 27

weighted graph API cycles and cuts Kruskal s algorithm Prim s algorithm advanced topics 28

Prim's algorithm example Prim's algorithm. [Jarník 1930, Dijkstra 1957, Prim 1959] Start with vertex 0 and greedily grow tree T. At each step, add cheapest edge that has exactly one endpoint in T. 0-1 0.32 0-2 0.29 0-5 0.60 0-6 0.51 0-7 0.31 1-7 0.21 3-4 0.34 3-5 0.18 4-5 0.40 4-6 0.51 4-7 0.46 6-7 0.25 29

Prim's Algorithm example 25% 75% 50% 100% 30

Prim's algorithm correctness proof Proposition. Prim's algorithm computes the MST. Pf. Let S be the subset of vertices in current tree T. Prim adds the cheapest edge e with exactly one endpoint in S. e is in the MST (cut property) S e 31

Prim's algorithm implementation Q. How to find cheapest edge with exactly one endpoint in S? A1. Brute force: try all edges. O(E V) time overall. O(E) time per spanning tree edge. 32

Prim's algorithm implementation Q. How to find cheapest edge with exactly one endpoint in S? A2. Maintain a priority queue of vertices connected by an edge to S Delete min to determine next vertex v to add to S. Disregard v if already in S. Add to PQ any vertex brought closer to S by v. Running time. log V steps per edge (using a binary heap). E log V steps overall. Note: This is a lazy version of implementation in Algs in Java lazy: put all adjacent vertices (that are not already in MST) on PQ eager: first check whether vertex is already on PQ and decrease its key 33

Key-value priority queue Associate a value with each key in a priority queue. API: public class MinPQplus<Key extends Comparable<Key>, Value> MinPQplus() create a key-value priority queue void put(key key, Value val) put key-value pair into the priority queue Value delmin() return value paired with minimal key Key min() return minimal key Implementation: start with same code as standard heap-based priority queue use a parallel array vals[] (value associated with keys[i] is vals[i]) modify exch() to maintain parallel arrays (do exch in vals[]) modify delmin() to return Value add min() (just returns keys[1]) 34

Lazy implementation of Prim's algorithm public class LazyPrim { Edge[] pred = new Edge[G.V()]; public LazyPrim(WeightedGraph G) { boolean[] marked = new boolean[g.v()]; double[] dist = new double[g.v()]; MinPQplus<Double, Integer> pq; pq = new MinPQplus<Double, Integer>(); dist[s] = 0.0; marked[s] = true; pq.put(dist[s], s); while (!pq.isempty()) { int v = pq.delmin(); if (marked[v]) continue; marked(v) = true; for (Edge e : G.adj(v)) { int w = e.other(v); if (!done[w] && (dist[w] > e.weight())) { dist[w] = e.weight(); pred[w] = e; pq.insert(dist[w], w); } } } } } pred[v] is edge attaching v to MST marks vertices in MST distance to MST key-value PQ get next vertex ignore if already in MST add to PQ any vertices brought closer to S by v 35

Prim's algorithm (lazy) example Priority queue key is distance (edge weight); value is vertex Lazy version leaves obsolete entries in the PQ therefore may have multiple entries with same value 0-2 0-7 0-1 0-6 0-5 0-7 0-1 0-6 0-5 7-1 7-6 0-1 7-4 0-6 0-5 7-6 0-1 7-4 0-6 0-5 0-1 0.32 0-2 0.29 0-5 0.60 0-6 0.51 0-7 0.31 1-7 0.21 3-4 0.34 3-5 0.18 4-5 0.40 4-6 0.51 4-7 0.46 6-7 0.25 0-1 7-4 0-6 0-5 4-3 4-5 0-6 0-5 3-5 4-5 0-6 0-5 red: pq value (vertex) blue: obsolete value 36

Eager implementation of Prim s algorithm Use indexed priority queue that supports contains: is there a key associated with value v in the priority queue? decrease key: decrease the key associated with value v [more complicated data structure, see text] Putative benefit : reduces PQ size guarantee from E to V PQ size is far smaller in practice not important for the huge sparse graphs found in practice widely used, but practical utility is debatable 37

Removing the distinct edge costs assumption Simplifying assumption. All edge weights w e are distinct. Fact. Prim and Kruskal don't actually rely on the assumption (our proof of correctness does) Suffices to introduce tie-breaking rule for compare(). Approach 1: public int compare(edge e, Edge f) { if (e.weight < f.weight) return -1; if (e.weight > f.weight) return +1; if (e.v < f.v) return -1; if (e.v > f.v) return +1; if (e.w < f.w) return -1; if (e.w > f.w) return +1; return 0; } Approach 2: add tiny random perturbation. 38

weighted graph API cycles and cuts Kruskal s algorithm Prim s algorithm advanced topics 39

Advanced MST theorems: does an algorithm with a linear-time guarantee exist? Year 1975 1976 1984 1986 1997 2000 Worst Case E log log V E log log V E log* V, E + V log V E log (log* V) E (V) log (V) E (V) Discovered By Yao Cheriton-Tarjan Fredman-Tarjan Gabow-Galil-Spencer-Tarjan Chazelle Chazelle 2002 20xx optimal Pettie-Ramachandran E??? deterministic comparison based MST algorithms Year Problem Time Discovered By 1976 Planar MST E Cheriton-Tarjan 1992 MST Verification E 1995 Randomized MST E Dixon-Rauch-Tarjan Karger-Klein-Tarjan related problems 40

Euclidean MST Euclidean MST. Given N points in the plane, find MST connecting them. Distances between point pairs are Euclidean distances. Brute force. Compute N 2 / 2 distances and run Prim's algorithm. Ingenuity. Exploit geometry and do it in O(N log N) [stay tuned for geometric algorithms] 41

Scientific application: clustering k-clustering. Divide a set of objects classify into k coherent groups. distance function. numeric value specifying "closeness" of two objects. Fundamental problem. Divide into clusters so that points in different clusters are far apart. Applications. Routing in mobile ad hoc networks. Identify patterns in gene expression. Document categorization for web search. Outbreak of cholera deaths in London in 1850s. Reference: Nina Mishra, HP Labs Similarity searching in medical image databases Skycat: cluster 10 9 sky objects into stars, quasars, galaxies. 42

k-clustering of maximum spacing k-clustering. Divide a set of objects classify into k coherent groups. distance function. Numeric value specifying "closeness" of two objects. Spacing. Min distance between any pair of points in different clusters. k-clustering of maximum spacing. Given an integer k, find a k-clustering such that spacing is maximized. spacing k = 4 43

Single-link clustering algorithm Well-known algorithm for single-link clustering: Form V clusters of one object each. Find the closest pair of objects such that each object is in a different cluster, and add an edge between them. Repeat until there are exactly k clusters. Observation. This procedure is precisely Kruskal's algorithm (stop when there are k connected components). Property. Kruskal s algorithm finds a k-clustering of maximum spacing. 44

Clustering application: dendrograms Dendrogram. Scientific visualization of hypothetical sequence of evolutionary events. Leaves = genes. Internal nodes = hypothetical ancestors. Reference: http://www.biostat.wisc.edu/bmi576/fall-2003/lecture13.pdf 45

Dendrogram of cancers in human Tumors in similar tissues cluster together. Gene 1 Gene n Reference: Botstein & Brown group gene expressed gene not expressed 46