Lecture 2: Karger s Min Cut Algorithm



Similar documents
5 Boolean Decision Trees (February 11)

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Modified Line Search Method for Global Optimization

Repeating Decimals are decimal numbers that have number(s) after the decimal point that repeat in a pattern.

Incremental calculation of weighted mean and variance

Hypothesis testing. Null and alternative hypotheses

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

CHAPTER 3 DIGITAL CODING OF SIGNALS

Soving Recurrence Relations

I. Chi-squared Distributions

Measures of Spread and Boxplots Discrete Math, Section 9.4

Lesson 15 ANOVA (analysis of variance)

CS103X: Discrete Structures Homework 4 Solutions

Your organization has a Class B IP address of Before you implement subnetting, the Network ID and Host ID are divided as follows:

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

A probabilistic proof of a binomial identity

Elementary Theory of Russian Roulette

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

Domain 1: Designing a SQL Server Instance and a Database Solution

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

Infinite Sequences and Series

Domain 1: Configuring Domain Name System (DNS) for Active Directory

Department of Computer Science, University of Otago

3. Greatest Common Divisor - Least Common Multiple

Chapter 7 Methods of Finding Estimators

1. C. The formula for the confidence interval for a population mean is: x t, which was

LECTURE 13: Cross-validation

5.4 Amortization. Question 1: How do you find the present value of an annuity? Question 2: How is a loan amortized?

The Stable Marriage Problem

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

CHAPTER 3 THE TIME VALUE OF MONEY

Asymptotic Growth of Functions

Hypergeometric Distributions

1 Computing the Standard Deviation of Sample Means

Confidence Intervals for One Mean

Filtering: A Method for Solving Graph Problems in MapReduce

Simple Annuities Present Value.

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL.

Static revisited. Odds and ends. Static methods. Static methods 5/2/16. Some features of Java we haven t discussed

CS100: Introduction to Computer Science

1 Correlation and Regression Analysis

THE ARITHMETIC OF INTEGERS. - multiplication, exponentiation, division, addition, and subtraction

Mann-Whitney U 2 Sample Test (a.k.a. Wilcoxon Rank Sum Test)

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

Lecture 4: Cheeger s Inequality

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

Determining the sample size


Present Value Factor To bring one dollar in the future back to present, one uses the Present Value Factor (PVF): Concept 9: Present Value

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design


WHEN IS THE (CO)SINE OF A RATIONAL ANGLE EQUAL TO A RATIONAL NUMBER?

How to use what you OWN to reduce what you OWE

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Properties of MLE: consistency, asymptotic normality. Fisher information.

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

Convexity, Inequalities, and Norms

Maximum Likelihood Estimators.

5: Introduction to Estimation

Chapter 5 Unit 1. IET 350 Engineering Economics. Learning Objectives Chapter 5. Learning Objectives Unit 1. Annual Amount and Gradient Functions

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Time Value of Money. First some technical stuff. HP10B II users

Factoring x n 1: cyclotomic and Aurifeuillian polynomials Paul Garrett <garrett@math.umn.edu>

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Solving Logarithms and Exponential Equations

Chapter 7: Confidence Interval and Sample Size

4. Trees. 4.1 Basics. Definition: A graph having no cycles is said to be acyclic. A forest is an acyclic graph.

The Power of Free Branching in a General Model of Backtracking and Dynamic Programming Algorithms

THE HEIGHT OF q-binary SEARCH TREES

Output Analysis (2, Chapters 10 &11 Law)

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

Multiple Representations for Pattern Exploration with the Graphing Calculator and Manipulatives

Concept: Types of algorithms

How To Solve The Homewor Problem Beautifully

5.3. Generalized Permutations and Combinations

THE ABRACADABRA PROBLEM

S. Tanny MAT 344 Spring be the minimum number of moves required.

I. Why is there a time value to money (TVM)?

The Forgotten Middle. research readiness results. Executive Summary

Detecting Voice Mail Fraud. Detecting Voice Mail Fraud - 1

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Engineering Data Management

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

Section 11.3: The Integral Test

Convention Paper 6764

Transcription:

priceto uiv. F 3 cos 5: Advaced Algorithm Desig Lecture : Karger s Mi Cut Algorithm Lecturer: Sajeev Arora Scribe:Sajeev Today s topic is simple but gorgeous: Karger s mi cut algorithm ad its extesio. It is a simple radomized algorithm for fidig the miimum cut i a graph: a subset of vertices S i which the set of edges leavig S, deoted E(S, S) has miimum size amog all subsets. You may have see a algorithm for this problem i your udergrad class that uses maximum flow. Karger s algorithm is elemetary ad ad a great itroductio to radomized algorithms. The algorithm is this: Pick a radom edge, ad merge its edpoits ito a sigle superode. Repeat util the graph has oly two superodes, which is output as our guess for mi-cut. (As you cotiue, the superodes may develop parallel edges; these are allowed. Selfloops are igored.) Note that if you pick a radom edge, it is more likely to come from parts of the graph that cotai more edges i the first place. Thus this algorithm looks like a great heuristic to try o all kids of real-life graphs, where oe wats to cluster the odes ito tightlykit portios. For example, social etworks may cluster ito commuities; graphs capturig similarity of pixels may cluster to give di eret portios of the image (sky, grass, road etc.). Thus istead of cotiuig Karger s algorithm util you have two superodes left, you could stop it whe there are k superodes ad try to uderstad whether these correspod to a reasoable clusterig. Today we will first see that the above versio of the algorithm yields the optimum mi cut with probability at least /. Thus we ca repeat it say 0 times, ad output the smallest cut see i ay iteratio. The probability that the optimum cut is ot see i ay repetitio is at most ( / ) 0 < 0.0. Ufortuately, this simple versio has ruig time about 4 which is ot great. So the we see a better versio with a simple tweak that brigs the ruig time dow to closer to. The idea is that roughly that repetitio esures fault tolerace. The real-life advice of makig two backups of your hard drive is related to this: the probability that both fail is much smaller tha oe does. I case of Karger s algorithm, the overall probability of success is too low. But if ru part of the way util the graph has / p superodes, the chace that the micut has t chaged is at least /. So you make two idepedet rus that go dow to / p superodes, ad recursively solve both of these. Thus the expected umber of istaces that will yield the correct micut is =. (Uwrappig the recursio, you see that each istace of size / p will geerate two istaces of size /, ad so o.) Simple iductio shows that this -wise repetitio is eough to brig the probability of success above / log. As you might suspect, this is ot the ed of the story but improvemets beyod this get more hairy. If aybody is iterested I ca give more poiters. Also this algorithm forms the basis of other algorithms for other tasks. Agai, talk to me for poiters.

CSE 03: Probability ad statistics Witer 00 Topic 4 Radomized algorithms, II 4. Karger s miimum cut algorithm 4.. Clusterig via graph cuts Suppose a mail order compay has the resources to prepare two differet versios of its catalog, ad it wishes to target each versio towards a particular sector of its customer base. The data it has is a list of its regular customers, alog with their purchase histories. How should this set of customers be partitioed ito two coheret groups? Oe way to do this is to create a graph with a ode for each of the regular customers, ad a edge betwee ay two customers whose purchase patters are similar. The goal is the to divide the odes ito two pieces which have very few edges betwee them. More formally, the miimum cut of a udirected graph G =(V,E) isapartitiooftheodesitotwo groups V ad V (that is, V = V V ad, V V = ), so that the umber of edges betwee V ad V is miimized. I the graph below, for istace, the miimum cut has size two ad partitios the odes ito V = {a, b, e, f} ad V = {c, d, g, h}. a b c d e f g h 4.. Karger s algorithm Here s a radomized algorithm for fidig the miimum cut: Repeat util just two odes remai: Pick a edge of G at radom ad collapse its two edpoits ito a sigle ode For the two remaiig odes u ad u,setv = {odes that wet ito u } ad V = {odes i u } A example is show i Figure 4.. Notice how some odes ed up havig multiple edges betwee them. 4..3 Aalysis Karger s algorithm returs the miimum cut with a certai probability. To aalyze it, let s go through a successio of key facts. Fact. If degree(u) deotes the umber of edges touchig ode u, the degree(u) = E. u V 4-

CSE 03 Topic 4 Radomized algorithms, II Witer 00 a b c d e f g h a c d e g h a c d e gh a c 4 edges to choose from Pick b f (probability /4) 3 edges to choose from Pick g h (probability /3) edges to choose from Pick d gh (probability /6) e dgh c 0 edges to choose from Pick a e (probability /0) ae dgh c 9edgestochoosefrom Pick ab ef (probability 4/9) abef dgh 5edgestochoosefrom Pick c dgh (probability 3/5) abef cdgh Doe: just two odes remai Figure 4.. Karger s algorithm at work. 4-

CSE 03 Topic 4 Radomized algorithms, II Witer 00 To see this, imagie the followig experimet: for each ode, list all the edges touchig it. The umber of edges i this list is exactly the left-had sum. But each edge appears exactly twice i it, oce for each edpoit. Fact. If there are odes, the the average degree of a ode is E /. This is a straightforward calculatio: whe you pick a ode X at radom, E[degree(X)] = Pr(X = u)degree(u) = degree(u) = E u V u where the last step uses the first Fact. Fact 3. The size of the miimum cut is at most E /. Cosider the partitio of V ito two pieces, oe cotaiig a sigle ode u, ad the other cotaiig the remaiig odes.thesizeofthiscutisdegree(u). Sice this is a valid cut, the miimum cut caot be bigger tha this. I other words, for all odes u, (size of miimum cut) degree(u). This meas that the size of the miimum cut is also the average degree, which we ve see is E /. Fact 4. If a edge is picked at radom, the probability that it lies across the miimum cut is at most /. This is because there are E edges to choose from, ad at most E / of them are i the miimum cut. Now we have all the iformatio we eed to aalyze Karger s algorithm. It returs the right aswer as log as it ever picks a edge across the miimum cut. Ifitalwayspicksao-cutedge,thethisedgewill coect two odes o the same side of the cut, ad so it is okay to collapsethemtogether. Each time a edge is collapsed, the umber of odes decreases by. Therefore, Pr(fial cut is the miimum cut) = Pr(first selected edge is ot i micut) Pr(secod selected edge is ot i micut) ( )( )( ) ( )( ) 4 3 = = ( ). 3 4 4 3 The last equatio comes from oticig that almost every umerator cacels with the deomiator two fractios dow the lie. Karger s algorithm succeeds with probabililty p /. Therefore, it should be ru Ω( )times,after which the smallest cut foud should be chose. Those who are familiar with miimum spaig tree algorithms might be curious to hear that aother way to implemet Karger s algorithm is the followig: Assig each edge a radom weight Ru Kruskal s algorithm to get the miimum spaig tree Break the largest edge i the tree to get the two clusters (Do you see why?) Over the decades, the ruig time of Kruskal s algorithm has bee thoroughly optimized via special data structures. Now this same techology ca be put to work for cuts! 4-3

= 3 4 3 = ()( ) =. I order to boost the probability of success, we simply ru the algorithm times. The probability that at least oe ru succeeds is at least ( ) Settig = c l we have error probability / c. e. It s easy to implemet Karger s algorithm so that oe ru takes O( ) time. Therefore, we have a O( 4 log ) time radomized algorithm with error probability /poly(). A faster versio of this algorithm was devised by Karger ad Stei [4]. The key idea comes from lookig at the telescopig product. I the iitial cotractios it s very ulikely we cotracted a edge i the miimum cut. Towards the ed of the algorithm, our probability of cotractig a edge i the miimum cut grows. From the earlier aalysis we have the followig. For a fixed miimum cut (S), the probability that this cut survives dow to vertices is at least /. Thus, for = / we have probability / of succeedig. Hece, i expectatio two trails should su ce. Improved algorithm: From a multigraph G, if G has at least 6 vertices, repeat twice:. ru the origial algorithm dow to / + vertices.. recurse o the resultig graph. Retur the miimum of the cuts foud i the two recursive calls. The choice of 6 as opposed to some other costat will oly a ect the ruig time by a costat factor. We ca easily compute the ruig time via the followig recurrece (which is straightforward to solve, e.g., the stadard Master theorem applies): T () = + T (/ ) = O( log ). Sice we succeed dow to / with probability /, we have the followig recurrece for the probability of success, deote by P (): P () P (/ + ). This solves to P () = log. Hece, similar to the earlier argumet for the origial algorithm, with O(log ) rus of the algorithm, the probability of success is /poly(). Therefore, i O( log 3 ) total time, we ca fid the miimum cut with probability /poly(). Before fiishig, we observe a iterestig corollary of Karger s origial algorithm which we will use i the ext lecture to estimate the (u)reliability of a etwork. 3

Corollary 5 Ay graph has at most O( ) miimum cuts. This follows from Lemma 3 sice that holds for ay specified miimum cut. Note, we ca also eumerate all of these cuts by the above algorithm. Refereces [] A. V. Goldberg ad R. E. Tarja. A ew approach to the maximum-flow problem. J. Assoc. Comput. Mach., 35(4):9 940, 988. [] J. Hao ad J. B. Orli. A faster algorithm for fidig the miimum cut i a graph. I Proceedigs of the Third Aual ACM-SIAM Symposium o Discrete Algorithms (Orlado, FL, 99), pages 65 74, New York, 99. ACM. [3] D. R. Karger. Global mi-cuts i RNC, ad other ramificatios of a simple mi-cut algorithm. I Proceedigs of the Fourth Aual ACM-SIAM Symposium o Discrete Algorithms (Austi, TX, 993), pages 30, New York, 993. ACM. [4] D. R. Karger ad C. Stei. A ew approach to the miimum cut problem. J. ACM, 43(4):60 640, 996. 4