Lecture 2: Analysis of Algorithms (CS )

Lecture 2: Analysis of Algorithms (CS583-002) Amarda Shehu September 03 & 10, 2014

1 Outline of Today s Class 2 Big-Oh, Big-Omega, Theta Techniques for Finding Asymptotic Relationships Some more Fun with Math 3 4 Average-case Analysis Average-case Analysis of Insertion Sort

Big-Oh, Big-Omega, Theta Techniques for Finding Asymptotic Relationships Some more Fun with Math Big-Oh: An Asymptotic Upper Bound Definition A function g(n) O(f (n)) if constants c > 0 and n 0 s.t g(n) c f (n) n n 0. Note: O(f (n)) denotes a set. Graphical Illustration little-oh: Tight Asymptotic Upper Bound g(n) o(f (n)) when the upper bound holds for all constants c > 0. Alternative definition: lim n g(n) f (n) = 0

Big-Oh, Big-Omega, Theta Techniques for Finding Asymptotic Relationships Some more Fun with Math Big-Omega: An Asymptotic Lower Bound Definition A function g(n) Ω(f (n)) if constants c > 0 and n 0 s.t g(n) c f (n) n n 0. Note: Ω(f (n)) denotes a set. Graphical Illustration little-omega: Tight Asymptotic Lower Bound g(n) ω(f (n)) when the lower bound holds for all constants c > 0. Alternative definition: lim n g(n) f (n) =

Big-Oh, Big-Omega, Theta Techniques for Finding Asymptotic Relationships Some more Fun with Math Theta: Asymptotic Upper and Lower Bounds Definition A function g(n) Θ(f (n)) if g(n) O(f (n)) and g(n) Ω(f (n)). Alternatively, g(n) Θ(f (n)) if positive constants c 1, c 2 and n 0 s.t. c 1 f (n) g(n) c 2 f (n) n n 0. Graphical Illustration Alternative Definition g(n) θ(f (n)) when lim n g(n) f (n) = O(1)

Big-Oh, Big-Omega, Theta Techniques for Finding Asymptotic Relationships Some more Fun with Math Techniques for Bounding Functions When bounding functions 1 Go back to definitions of O, Ω, θ 2 Know when the notations do not apply: e.g., in cases of periodic functions like sin(n) 3 Find limits when n 1 Simple transformations: e.g., lim n (n)/n 2 L Hospital s Rule: e.g., lg(n) O( (n)) 3 Combinaton of both: e.g., n! ω(2 n ) 4 Recall techniques to evaluate derivatives: e.g., d dx (x x ) =?

Big-Oh, Big-Omega, Theta Techniques for Finding Asymptotic Relationships Some more Fun with Math Techniques for Bounding Summations When bounding Summations 1 Obtain answer for series (A.1) or derive it: n 1 i=0 (a 0 + id) = (n+1) (a0+an) 2 n 2 i=0 ai = an+1 1 a 1 for a > 1 If a i+1 a i r, sum over series n i=1 a0r i n 3 i=0 i 2i = (n 1) 2 n+1 + 2 2 Guess the answer and prove by induction 3 Bound each of the terms: e.g., n i=1 1 O(lnn) k 2 4 Split the summation: e.g. k=0 k2 (A.2) 2 k 5 Bound by integral (recall techniques to evaluate integrals) 1 2 n i=1 1 i 2 n i=1 O(1) lg(i) θ(n lnn)

Big-Oh, Big-Omega, Theta Techniques for Finding Asymptotic Relationships Some more Fun with Math Flex your Muscles Moderate to Difficult Exercises 1 n i=0 ai O((n + 1) a n ) 2 lg k n o(n ɛ ), ɛ > 0, k 1 3 n k o(c n ), c > 1, k 1 4 Is lgn! polynomially bounded? 5 Is lglgn! polynomially bounded? Figure : The Koch snowflake illustrates the geometric series ar i with a = 1/3 and r = 4/9. Summation gives the area of this snowflake as 8/5 of the blue triangle. c wikipedia.

What is a Recurrence? A recurrence is an equation of inequality that describes a function in terms of its value on smaller inputs Example: T (n) of Mergesort is described in terms of T (n/2) Recurrences have boundary conditions (bottom out) Example: T (n) = c when n = 1 1 Iteration or expansion method 2 Recursion-tree method 3 Substitution method 4 5 Generating Functions

Outline of Today s Class Expand T (n) = 2T (n/2) + cn iterate down to boundary condition T (n) = 2T (n/2) + cn = 2 [2T (n/4) + c n 2 ] + cn = 4 [2T (n/8) + c n 4 ] + 2cn = 8 T (n/8) + 3cn = 2 3 T (n/2 3 ) + 3cn =... do you see the pattern? = 2 k T (n/2 k ) + kcn Since the recursion bottoms out at n = 1, k = lg(n). So: T (n) = n T (1) + lg(n) cn = cn + cn lg(n) θ(n lgn) Try to solve T (n) = T (n 1) + n, where T (1) = 1. Try to solve T (n) = 2T (n/2) + n, where T (1) = 1.

Build recursion tree for T (n) = 2T (n/2) + c n:

Example of Solve T (n) = T (n/4) + T (n/2) + n 2 :

Substitution (Induction) Method Guess that T (n) = 2T ( n 2 ) + n O(n + n lgn), where T (1) = 1. Then use induction to prove that the guess is correct. 1 Base Case: The boundary condition states that T (1) = 1. The guess states that T (1) O(1 + 1 lg1). Since, 1 + 1 lg1 = 1 and 1 O(1), the guess is correct. 2 Inductive Step: Assuming that T ( n 2 ) O( n 2 + n 2 lg(n 2 )), we have to show that the guess holds for T (n): T (n) = 2T ( n 2 ) + n 2[c ( n 2 + n 2 lg(n 2 ))] + n, where c > 0 = c n + c n lgn cn + n = c n lgn + n Easy to show that c n lgn + n O(n + n lgn)

Outline of Today s Class Theorem: Let a 1 and b > 1 be constants, let f (n) be a function, and let T (n) be defined on the nonnegative integers by the recurrence T (n) = a T (n/b) + f (n), where n/b can mean n/b or n/b. 1 If f (n) O(n log ba ɛ ) for some constant ɛ > 0, then T (n) θ(n log ba ) 2 If f (n) θ(n log ba ), then T (n) θ(n log ba lgn) 3 If f (n) Ω(n log ba+ɛ ) for some constant ɛ > 0, and if a f (n/b) cf (n) for some constant c < 1 and all sufficiently large n, then T (n) θ(f (n)) Examples: T (n) = 9T (n/3) + n, T (n) = T ( 2n 3 ) + 1, T (n) = 3T ( n 4 ) + nlgn, T (n) = 2T ( n 2 ) + nlgn, T (n) = n T 2 ( n 2 ).

Idea Behind : Case 1.

Idea Behind : Case 1. Figure : The weight increases geometrically from the root to the leaves. The leaves hold a constant fraction of the total weight. T (n) θ(n log ba ).

One can use generating functions and characteristic equations to solve recurrences. This is a topic beyond the scope of this class. However, a simple example on the Fibonacci recurrence showcases the power of this technique for solving recurrences. Let us associate with the sequence {a n } the generating function a(x) = n=0 a nx n. In this way, the recurrence relation for {a n } can be interpreted as an equation for a(x). Example: Find the generating function for Fibonacci sequence and derive a closed form expression for the n th Fibonacci number.

Generating Function for Fibonacci Sequence Solution: Let F (x) = n=0 f nx n be the generating function for the Fibonacci sequence. Since the sequence satisfies the recurrence f n = f n 1 + f n 2, an explicit form for F (x) when n 2 as follows: f n = f n 1 + f n 2 f n x n = f n 1 x n + f n 2 x n n=2 f nx n = n=2 f n 1x n + n=2 f n 2x n n=2 f nx n = x n=1 f nx n + x 2 n=0 f nx n So: F (x) f 0 xf 1 = x(f (x) f 0 ) + x 2 F (x) F (x)(1 x x 2 ) = f 0 + x(f 1 f 0 ) = x x F (x) = 1 x x 2

Closed Form Expression for Fibonacci Sequence To get a closed form expression for f n, get a closed form expression for the coefficient of x n in the expansion of the generating function. This requires decomposing into partial fractions. 1 1 x x 2 = (1 x 5x 2 )2 ( 2 )2 = (1 x 5x 2 2 )(1 x 5x 2 + 2 ) Let s write the above generally as 1 x x 2 = (1 bx)(1 cx), where b = (1 + 5)/2 and c = (1 5)/2. From this, then F (x) = 1/[(1 bx) (1 cx)] =...1/ 5 n 0 (bn+1 c n+1 )x n. So, for all n 0: f n = 1 [( 1 + (5) ) n+1 ( 1 (5) ) n+1 ] (5) 2 2

Average-case Analysis Average-case Analysis of Insertion Sort What does Mean refers to the use of probability theory in the analysis of algorithms It is often useful to to analyze the running time of an algorithm To perform a probabilistic analysis, we have to make assumptions on the distribution of inputs After such assumption, we compute an expected running time that is computed over the distribution of all possible inputs In problems where it is not possible to describe an input distribution, probabilistic analysis is not applicable

Average-case Analysis Average-case Analysis of Insertion Sort Some Basic Probability Given a sample space S and an event A (which takes values from S), the indicator random variable I(A) associated with the event A is defined as: { 1 if A occurs I (A) = 0 if A does not occur Question: Can you determine the expected number of heads obtained when flipping a fair coin? Sample space S = {H, T }, where H and T refer to Head or Tail. The event A = H The indicator random variable: X H = I (H) = { 1 if H occurs 0 if T occurs

Average-case Analysis Average-case Analysis of Insertion Sort Simple Example The expected number of H s from one flip of the coin is the expected value of the indicator random variable X H : E[X H ] = E[I (H)] = 1 P(H) + 0 P(T ) = 1 (1/2) + 0 (1/2) = 1/2 Expected number of H s from one flip of a fair coin is 1/2. Indicator random variables are useful to analyze situations in which one performs repeated random trials (Bernoulli trials). Q: What is the expected number of H s in n flips of a fair coin? Let X i = I {the i th flip results in H} and X = n i=1 X i the total number of H s in n flips. Then: Then: E[X ] = E[ n i=1 X i] = n i=1 E[X i] = n i=1 1/2 = n/2

Average-case Analysis Average-case Analysis of Insertion Sort Probability Analysis for Average Case Performance Recall that the worst-case running time T (n) of an algorithm refers to the maximum time of the algorithm on any input of size n. The is also known as the usual case. The best-case T (n) is when a slow algorithm cheats by working fast on some particular input. This is also known as the bogus case. The average-case T (n) refers to the expected time of the algorithm over all inputs of size n. This is known as the sometimes case. Average-case analysis needs assumption of statistical distribution of inputs Employ probabilistic analysis to show that the average-case T (n) of insertion sort is θ(n 2 )

Average-case Analysis Average-case Analysis of Insertion Sort Average-case Analysis of Insertion Sort Recall that T (n) of insertion sort is n j=2 f (n), where f (n) sums the execution time of the inner while loop. Ignoring the machine-dependent constants, f (n) measures the number of times the elements in A[1...j 1] have to be moved right to open up the correct position where to insert the value in A[j]. Hence, f (n) records the expected number of moves so A[j] is in correct position. Question: What is f (n)?

Average-case Analysis Average-case Analysis of Insertion Sort Average-case Analysis of Insertion Sort Let k denote the total number of moves to the right and k i denote the number of moves when A[i] > A j. Then: E(k) = j 1 i=1 Prob.(A[i] > A[j]) k i = j 1 i=1 1 j (j i) = 1 j j 1 i=1 (j i) = 1 j [ j 1 i=1 j j 1 i=1 i] = 1 j [j j j (j 1) 2 ] = j j 1 2 = 2j j+1 2 = j+1 2 So, T (n) = n j=2 j+1 2 = (n+4) (n 1) 4 θ(n 2 )