Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online shopping). Document preparation/processing. Scheduling flights. Manufacturing. Medicine. 1
Algorithm: A well defined computational procedure to solve a problem. Focus: Combinatorial problems (i.e., problems whose solution space is discrete). Problem Specification: Input to the problem. Output to be produced. Problem 1: Finding the maximum value. Input: An array A[1.. n] of n integers. Output: The maximum value in the input. 2
Problem 2: Sorting into non-decreasing order. Input: An array A[1.. n] of n integers. Output: A permutation of the array elements so that A[1] A[2]... A[n]. Problem 3: Zero Sum. Input: An array A[1.. n] of n integers. (The integer values may be positive, negative or zero.) Output: True if the array has two elements A[i] and A[j] (i j) such that the sum A[i] + A[j] is zero; False otherwise. Note: Zero Sum is a decision problem. 3
Definition: Given an array A[1.. n], a block is a subarray A[i.. j], where 1 i j n. Note: Each element A[i] is a block by itself. Problem 4: Maximum Block Sum. Input: An array A[1.. n] of n integers. (The integer values may be positive, negative or zero.) Output: The maximum value among the sums of all the blocks of A. Note: Maximum Block Sum is an optimization problem. 4
Problem 5: Shortest Path. Input: A network G consisting of nodes and edges, with each edge having a length (non-negative number); two nodes u and v. Output: A shortest path between u and v in G. Example: a 4 b 5 7 c 3 4 3 4 u 3 3 d 3 v e 2 f Note: Shortest Path is also an optimization problem. 5
Boolean Satisfiability Problem: Terminology and Notation: Boolean variable x: Takes on a value from {True, False}. Complement of x is denoted by x. (The symbols x and x are called literals.) Boolean operators (connectives): e.g. And (denoted by ), Or (denoted by ). A Boolean formula is constructed using literals, connectives and parentheses. 6
Example: The following formula F uses three Boolean variables x 1, x 2 and x 3. F = (x 1 x 2 x 3 ) (x 2 x 3 ) (x 1 x 3 ) (a) Let x 1 = True, x 2 = True and x 3 = True. assignment, F evaluates to False. For this (b) Let x 1 = True, x 2 = True and x 3 = False. For this assignment, F evaluates to True. Definition: A Boolean formula F is satisfiable if there is at least one assignment of values to the variables in F for which F evaluates to True. 7
Examples: Formula F = (x 1 x 2 x 3 ) (x 2 x 3 ) (x 1 x 3 ) is satisfiable. (One satisfying assignment for F is x 1 = True, x 2 = True and x 3 = False.) Formula F 1 defined by F 1 = (x 1 ) (x 2 ) (x 1 x 2 ) is not satisfiable. Problem 6: Boolean Satisfiability (SAT) Input: A formula F constructed using Boolean variables x 1, x 2,..., x n, their complements and Boolean operators. Output: True if F satisfiable and False otherwise. Note: SAT is also a decision problem. 8
Exercises: 1. Find the number of satisfying assignments for the formula F given on page 7. 2. Construct a formula F 2 using three variables x 1, x 2 and x 3 such that F 2 has exactly one satisfying assignment. Indicate the satisfying assignment. 3. Construct a Boolean formula F 3 using two variables x 1 and x 2 such that F 3 evaluates to True for every assignment. 9
An Algorithm for Finding the Maximum: Input: Array A[1.. n] of integers. Pseudocode: 1. Max = A[1] 2. for i = 2 to n do if (A[i] > 3. Print Max Max) then Max = A[i] Correctness of an Algorithm: Algorithm must halt for every input instance. Must produce the correct output for every input instance. 10
Analyzing an Algorithm: Estimating the resources (e.g. running time, memory) needed in a machine independent manner. Useful in comparing candidate algorithms for a problem. Computational Model: Each primitive operation (e.g. arithmetic operation, comparison, assignment) takes one unit of time. Running time for a specific input: Number of primitive operations executed by the algorithm for that input (expressed as a function of the input size ). 11
Running Time of an Algorithm: The longest running time for any input of a certain size. Also called worst-case time (or time complexity). Input Size: Depends on the problem. Examples: (a) Sorting: Number (n) of input values. (b) Graph problems: Number of nodes + Number of edges. (c) SAT problem: Number of literals + Number of operators + Number of parentheses. 12
Running Time Analysis Example I: Pseudocode: 1. Max = A[1] 2. for i = 2 to n do if (A[i] > Max) then Max = A[i] 3. Print Max Analysis: Input size = n. Step 1: No. of operations = 1. Step 2: The for loop executes n 1 times. Each time through the loop, we have: 13
(a) Two operations (comparison and increment) on i. (b) At most two operations (a comparison and an assignment) for the if statement. So, the total number of operations over all the iterations of the loop is at most 4(n 1). Step 3: No. of operations = 1. So, the total number of operations carried out by the algorithm is at most 1 + 4(n 1) + 1 = 4n 2. Running Time Analysis Example 2: Zero Sum: Input: Array A[1.. n] containing n integers. 14
Pseudocode: 1. for i = 1 to n 1 do 1.1 for j = i + 1 to n do if (A[i] + A[j] = 0) then Print True and stop. 2. Print False. Analysis: Step 1: The for loop runs n 1 times. For each iteration of this loop: (a) The for loop in Step 1.1 runs n i times. (b) During each iteration of this inner for loop, at most 6 operations are carried out: 15
Comparison and increment for j, the sum A[i] + A[j] and its comparison with 0 and print and stop operations. (c) So, the total number of operations carried out during all iterations of the inner for loop is at most 6(n i). So, the total number of operations carried out during all the iterations of the outer for loop is n 1 6(n i) = 3n(n 1). Step 2: No. of operations = 1. Conclusion: i=1 The total number of operations carried out by the algorithm is at most 3n(n 1) + 1. 16
Disadvantages: This type of detailed analysis is too tedious. The exact number of operations is not too insightful. Simplified Representation: Order or Big-O Notation. Running time as the input size becomes large (also called asymptotic running time). Facilitates the comparison of algorithms for a problem. Basic Ideas: Use only the most dominant term in the expression for the number of operations. Suppress additive and multiplicative constants. 17
Examples: The number of steps used by the algorithm for finding the maximum is at most 4n 2. This is expressed as O(n). So, the running time of the maximum finding algorithm is O(n). The number of steps used by the algorithm for the Zero Sum problem is at most 3n(n 1) + 1. So, the running time of the algorithm for the Zero Sum problem is O(n 2 ). Additional Examples: (a) Let f(n) = n 3 + 24n 2 17. Then, f(n) = O(n 3 ). (b) Let g(n) = 2n log 2 n + 31n + 97. Then, g(n) = O(n log 2 n). Note: O(1) denotes a constant. 18
Exercises: n 1. Find the big-o representation for i=1 (2i2 ). 2. Suppose f(n) = 8n 3 + 2n 2 17 and g(n) = n 5 + n 2 + 19. Find the big-o representations for f(n) + g(n) and f(n) g(n). Algorithms for the Maximum Block Sum Problem: Input: An array A[1.. n] of integers. Idea behind Algorithm I: Consider each block of A and compute its sum. Output the largest sum found. 19
Pseudocode for Algorithm I: 1. MaxSum = A[1] 2. for i = 1 to n do 2.1 for j = i to n do temp = FindSum(A, i, j) if (temp > MaxSum) then MaxSum = temp 3. Print MaxSum function FindSum(A, i, j) 1. sum = 0 2. for k = i to j do sum = sum + A[k] 3. return sum 20
Analysis: Each call to FindSum takes O(n) time. Each iteration of the loop in Step 2.1 runs in O(n) time (since the dominant time is due to the call to FindSum). Since the loop itself runs at most n times, the running time of the loop in Step 2.1 is O(n 2 ). The loop in Step 2 runs n times. Each iteration of this loop executes the loop in Step 2.1. Since the latter takes O(n 2 ) time, the time to complete Step 2 is O(n n 2 ) = O(n 3 ). Steps 1 and 3 take O(1) time. So, the overall running time is O(n 3 ). 21
Algorithm II for Maximum Block Sum Idea: For each i, (1 i n), there are n i+1 blocks whose starting point is A[i]. For each such block, Algorithm I computes the sum in O(n) time (using the FindSum function). So, the time used to compute the sums for all the blocks with starting point A[i] is O(n 2 ). It is possible to compute the sums of all the blocks that start at A[i] in O(n) time as follows. Let S i,j denote the sum of the block A[i.. j]. S i,i = A[i] S i,i+1 = S i,i + A[i + 1] S i,i+2 = S i,i+1 + A[i + 2]. 22
S i,n = S i,n 1 + A[n] The resulting algorithm runs in time O(n 2 ). Exercise: Write pseudocode for Algorithm II using the above idea. Verify that the running time of the algorithm is O(n 2 ). Algorithm III for Maximum Block Sum Idea: (a) Observation: A block with the maximum sum ends at A[i] for some i, 1 i n. (b) For each i, compute and store the maximum sum among all blocks that end at A[i]. (c) The largest sum found in (b) is the solution. 23
How to carry out Step (b): Use an auxiliary array B[1.. n]; for each i, let B[i] store the maximum sum among the blocks that end at A[i]. Note that B[1] = A[1]. For any i 2, B[i] = B[i 1] + A[i] if B[i 1] > 0 = A[i] otherwise. Example: To be presented in class. 24
Pseudocode for Algorithm III: 1. B[1] = A[1] 2. for i = 2 to n do if (B[i 1] > 0) then B[i] = B[i 1] + A[i] else B[i] = A[i] 3. Find and print the maximum value in B[1.. n] Running time of Algorithm III: Step 1: O(1) time. Step 2: The loop executes at most n times and each iteration of the loop uses O(1) time. So, the total time for the loop is O(n). 25
Step 3: O(n) time. So, the overall running time is O(n). Conclusion: Algorithm III has the best running time for the Maximum Block Sum problem. Definition: An algorithm is efficient if its running time is a polynomial function of the input size. Note: Polynomial means that the running time can be expressed as O(n k ) where n is the problem size and k is a constant independent of n. 26
Examples: The algorithm for finding the maximum runs in O(n) time. The algorithm for the Zero Sum problem runs in O(n 2 ) time. All the three algorithms for the Maximum Block Sum problem are efficient algorithms. (Their respective running times are O(n 3 ), O(n 2 ) and O(n).) Problems such as Sorting and Shortest Path can also be solved efficiently. Exercise: Several algorithms with running times of O(n log 2 n) are known for the sorting problem. Also, given a sorted array A[1.. n] and a value q, binary search can be used to determine whether or not A contains the value q in O(log 2 n) time. Use these facts to devise an O(n log 2 n) algorithm for the Zero Sum problem. 27
NP-Complete Problems: Terminology: The class P contains all the problems for which a solution can be found in polynomial time. The class NP contains all the problems for which a given solution can be verified in polynomial time. Notes: 1. NP denotes Nondeterministic Polynomial. 2. For mathematical convenience, the class NP is restricted to decision problems. 28
Example: SAT is in NP. Given an assignment of values to the Boolean variables of a formula F, we can evaluate F and thus determine whether the given assignment satisfies the formula in polynomial time. However, we don t know how to find a satisfying assignment in polynomial time. Note: Every problem in P is also in NP. (Thus, P NP.) Other Problems in NP: Problem 7: Clique Input: A set S of n people, a set P of pairs of people from S who know each other and an integer K n. Question: Is there a subset S of S containing at least K people such that each person in S knows every other person in S? 29
Problem 8: Subset Sum Input: A set S of n integers and another integer Q. Question: Is there a subset S of S such that the sum of the integers in S is equal to Q? Problem 9: Longest Path Input: A graph G with vertex set V, edge set E, two vertices u and v and an integer K V. Question: Is there a path of length at least K between u and v? NP-Complete Problems: The hardest problems in NP. 30
These problems are equivalent in the following sense: (a) If any one of the NP-complete problems can be solved in polynomial time, then all of them can be solved in polynomial time. (If this happens, then P = NP.) (b) If we can show that there is no polynomial algorithm for any one of the NP-complete problems, then none of them can be solved in polynomial time. (If this happens, then P NP.) The problems SAT, Clique, Subset Sum and Longest Path defined above are known to be NP-complete. Thousands of problems that arise in practical applications are known to be NP-complete. P? = NP is a major open question. 31