Data Structures. Algorithm Performance and Big O Analysis

Similar documents

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015

CSC 180 H1F Algorithm Runtime Analysis Lecture Notes Fall 2015

CSC148 Lecture 8. Algorithm Analysis Binary Search Sorting

Analysis of Binary Search algorithm and Selection Sort algorithm

What Is Recursion? Recursion. Binary search example postponed to end of lecture

Why Use Binary Trees?

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case.

In mathematics, it is often important to get a handle on the error term of an approximation. For instance, people will write

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Class Overview. CSE 326: Data Structures. Goals. Goals. Data Structures. Goals. Introduction

Algorithms. Margaret M. Fleck. 18 October 2010

The Running Time of Programs

SIMS 255 Foundations of Software Design. Complexity and NP-completeness

recursion, O(n), linked lists 6/14

Many algorithms, particularly divide and conquer algorithms, have time complexities which are naturally

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

16. Recursion. COMP 110 Prasun Dewan 1. Developing a Recursive Solution

Recursive Algorithms. Recursion. Motivating Example Factorial Recall the factorial function. { 1 if n = 1 n! = n (n 1)! if n > 1

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

6. Standard Algorithms

Binary search algorithm

Section IV.1: Recursive Algorithms and Recursion Trees

Algorithm Design and Recursion

Analysis of Computer Algorithms. Algorithm. Algorithm, Data Structure, Program

CS473 - Algorithms I

The Tower of Hanoi. Recursion Solution. Recursive Function. Time Complexity. Recursive Thinking. Why Recursion? n! = n* (n-1)!

Binary Heap Algorithms

Zabin Visram Room CS115 CS126 Searching. Binary Search

Algorithm Analysis [2]: if-else statements, recursive algorithms. COSC 2011, Winter 2004, Section N Instructor: N. Vlajic

Notes on Factoring. MA 206 Kurt Bryan

Solving Problems Recursively

UNIT AUTHOR: Elizabeth Hume, Colonial Heights High School, Colonial Heights City Schools

COLLEGE ALGEBRA. Paul Dawkins

Attention: This material is copyright Chris Hecker. All rights reserved.

Functions Recursion. C++ functions. Declare/prototype. Define. Call. int myfunction (int ); int myfunction (int x){ int y = x*x; return y; }

4.2 Sorting and Searching

Binary Heaps. CSE 373 Data Structures

Chapter 7: Additional Topics

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

AP Computer Science AB Syllabus 1

Introduction to SQL for Data Scientists

Section 4.1 Rules of Exponents

14:440:127 Introduction to Computers for Engineers. Notes for Lecture 06

Data Structures and Algorithms Written Examination

Searching Algorithms

Club Accounts Question 6.

Boolean Expressions, Conditions, Loops, and Enumerations. Precedence Rules (from highest to lowest priority)

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

Battleships Searching Algorithms

Random Fibonacci-type Sequences in Online Gambling

Classification - Examples

Computer Science 210: Data Structures. Searching

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

Section 1.5 Exponents, Square Roots, and the Order of Operations

Persistent Data Structures

The Taxman Game. Robert K. Moniot September 5, 2003

8 Square matrices continued: Determinants

This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers

Kapitel 1 Multiplication of Long Integers (Faster than Long Multiplication)

Numeracy Preparation Guide. for the. VETASSESS Test for Certificate IV in Nursing (Enrolled / Division 2 Nursing) course

Numerical Matrix Analysis

Multiplying and Dividing Fractions

Pseudo code Tutorial and Exercises Teacher s Version

CORDIC: How Hand Calculators Calculate

Return on Investment (ROI)

An example of a computable

Hydraulics Prof. A. K. Sarma Department of Civil Engineering Indian Institute of Technology, Guwahati. Module No. # 02 Uniform Flow Lecture No.

Chapter 3. if 2 a i then location: = i. Page 40

MITI Coding: Transcript 5

5.2 The Master Theorem

How to Study Mathematics Written by Paul Dawkins

Lecture Notes on Linear Search

Data Structures and Algorithms

Chapter 5 Functions. Introducing Functions

Lecture 4 Online and streaming algorithms for clustering

Polynomials. Dr. philippe B. laval Kennesaw State University. April 3, 2005

IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS

A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

3.2 LOGARITHMIC FUNCTIONS AND THEIR GRAPHS. Copyright Cengage Learning. All rights reserved.

REVIEW EXERCISES DAVID J LOWRY

Introduction to Python

Parallel Scalable Algorithms- Performance Parameters

Chapter 1. NP Completeness I Introduction. By Sariel Har-Peled, December 30, Version: 1.05

Loop Invariants and Binary Search

Distributed Computing over Communication Networks: Maximal Independent Set

Factoring Numbers. Factoring numbers means that we break numbers down into the other whole numbers that multiply

To My Parents -Laxmi and Modaiah. To My Family Members. To My Friends. To IIT Bombay. To All Hard Workers

Binary Multiplication

With the Tan function, you can calculate the angle of a triangle with one corner of 90 degrees, when the smallest sides of the triangle are given:

Dynamic Programming. Lecture Overview Introduction

Factoring ax 2 + bx + c - Teacher Notes

CS473 - Algorithms I

Transcription:

Data Structures Algorithm Performance and Big O Analysis

What s an Algorithm? a clearly specified set of instructions to be followed to solve a problem. In essence: A computer program. In detail: Defined mathematically in a course on Theory of Computation (Turing).

What s Big O do? Measures the growth rate of an algorithm as the size of its input grows. Huh? O is a math function that helps estimate how much longer it takes to run n inputs versus n+1 inputs (or n+2, 2n, 3n ). Doesn t care what language you use! Only cares about the underlying algorithm.

What Doesn t O do? Doesn t tell you that algorithm A is faster than algorithm B for a particular input. Why not? Only tells you if one grows faster than another in a general sense for all inputs. Usually concerned with very large data inputs. Called asymptotic algorithm analysis.

Example: Doesn t Care About Particular Input public void algorithm1(object xinput) 16 million lines of code public void algorithm2(object xinput) if (xinput.size() == 2074) return; else 16 million different lines Which one is faster? Always? We only care about the average behavior of the 16 million lines.

How Calculate Run Time? 1. Each basic operation in the code counts for 1 time unit. A basic operation executes in the same time no matter what values it is supplied. Examples: Adding two integers is a basic operation. Reading a[1] is a basic operation (independent of array size). Summing the values in an array is NOT a basic operation (why?). 2. Ignore actual time units (seconds, days, etc.). Could be 1 ns for a fast computer or 1 day for a really slow computer. But for large inputs, won t matter. 3. Ignore time for method calls, returns, and declarations. Doesn t matter in the long run.

Run Time Example 1 Calculating N i 1 i 3 How long to run this code? public int sum(int num) int partialsum = 0; for(int i=1; i<= num; i++) partialsum += i * i * i; return partialsum;

Run Time Example 1 (cont.) Calculating i 1 public int sum(int num) int partialsum = 0; for(int i=1; i<= num; i++) partialsum += i * i * i; N i 3 How long to run this code? no cost costs 1 (to init/store in memory) costs 1 (to init/store in memory) costs N+1 (once for each test of <=) (and +1 because of last time through, when it fails) costs 2 N (once for each + and = recall i++ is just i = i + 1) total cost of 4N (costs 4 per execution 1 addition, 2 multiplications, 1 assignment) return partialsum; no cost Final tally: 1+1+(N+1)+2N+4N = 7N+3

Run Time Example 2 public int sum(int num) int partialsum = 0; for(int i=1; i<= num; i++) for(int j=1; j<= num; j++) partialsum += i * j; no cost costs 1 costs 1 costs N+1 costs 2N costs N*1 costs N*(N+1) costs N*2N costs N*N*3 return partialsum; no cost Final tally: 1+1+(N+1)+2N+N*1+N*(N+1) +N*2N+N*N*3 = 6N 2 +5N+3

But This is Overkill! public int sum(int num) int partialsum = 0; for(int i=1; i<= num; i++) for(int j=1; j<= num; j++) partialsum += i * j; return partialsum; Really only one operation, and it happens N 2 times So we say order of N 2, or O(N 2 )

Likewise, More Overkill Calculating N i 1 i 3 public int sum(int num) int partialsum = 0; for(int i=1; i<= num; i++) partialsum += i * i * i; return partialsum; Really only one operation, and it happens N times So we say order of N, or O(N)

Another Order Of Example public void cool(int n) for(int i=2; i<=n; i++) int j = (1 + i * i % 3 % i) / (i + 2); The heart of the code is this line. And it happens N times. So order of N. Or say O(N). Also, run time, T(N) = 10N - 8. Can you show me?

Ah, Back to the Big-O Call T(N) the run time. (Definition) Definition: T(N) = O(f(N)) if there are positive constants c and n 0 such that T(N) c f(n) when N > n 0. What s it mean? The run time is always less than f(n) for big enough N. (Only the highest order term matters!) (And constants don t matter.) Note: f(n) should be the smallest such function such that c and n 0 exist.

Example Using Big-O Definition In last example, T(N) = 10N - 8 So, let s guess T(N) = 10N - 8 = O(N 3 ) To show that, must show 10N+1 c N 3 for some big enough N. Let c=1. Then 10N - 8 c N 3 is true for all N > 10. In fact, true for all N > 4.» i.e., in definition, let n 0 = 4 So by definition, 10N - 8 = O(N 3 ). But that s not as good as we can do! Let s try O(N).

Another Example Using Definition Let s guess T(N) = 10N-8 = O(N) To show that, must show 10N-8 c N for some big enough N. Let c=10. Then 10N-8 c N is true for all N > 0.» i.e., in definition, let n 0 = 1 So by definition, 10N-8 = O(N). No matter how hard you try, that s the smallest exponent on N that will work. i.e., O(N) is the best we can do. And O(N) matches our intuition from the example code!

Example: O Constants If T(N) = 23N 2 562 Then T(N) = O(N 2 ) Which means: We guarantee that T(N) grows at a rate no faster than N 2. We say c N 2 is an upper bound on T(N). for c >23

Wait, you say Ok, T(N) = 23N 2 562 = O(N 2 ). But 23N 2 grows faster than N 2. What s up with that? Shouldn t it be O(23N 2 )? NO! We are concerned with the rate of growth as N increases.

Wait, you say (Part 2) Consider 23N 2 and N 2. If N doubles in size, how much longer does it take to run? 23 (2N) 2 = 4 * (23 N 2 ) and (2N) 2 = 4 * (N 2 ) In both cases, takes 4 times as long. The rate of growth is just N 2. The constant didn t matter!!!

Another Example Consider T(N) = 5 N 3 versus N 3 If we triple the number of inputs, how much longer does it take to run? 5 (3N) 3 = 27 * (5 N 3 ) and (3N) 3 = 27 * (N 3 ) In both cases, takes 27 times as long. The rate of growth is just N 3. The constant didn t matter!!!

Yet Another Example If T(N) = 7N 3 N + 56 Then T(N) = O(N 3 ) Which means: We guarantee that T(N) grows at a rate no faster than N 3.

Wait a cotton, pickin T(N) = 7N 3 N + 56= O(N 3 ) You mean to say the N doesn t matter? Yup! We are concerned with the asymptotic behavior for big N. (Remember those limits in calculus?)

Wait a cotton, pickin (Part 2) As N gets huge, N 3 dwarfs N. N 3 = 1000 3 = 1,000,000,000 which is a lot bigger than N = 1000. (1 part in a million!) For even bigger values, it s quickly 1 part in a billion billion. (Then 1 part in a billion billion billion yada, yada, yada.) Only the biggest exponent matters. (Called asymptotic analysis.)

Wait a cotton, pickin (Part 3) Consider T(N) = 7N 3 N + 56 versus N 3 If we take 1000 times the number of inputs, how much longer does it take to run? 7 (1000N) 3 1000 N + 56= 1,000,000,000 * (7 N 3 ) 1000 N + 56 and (1000N) 3 = 1,000,000,000 * (N 3 ) The first term is MUCH bigger than the other terms. For any value of N, like 10, the smaller terms subtract an insignificant amount from the total. Smaller terms don t matter

Review: The Difference Between T(N) and O(N) T(N) is total run time. O(N) is the approximation to the run time where we ignore constants and lower order terms. We call it the growth rate. Also called asymptotic approximation. Example: if T(N) = 3 N 2 + N + 1 then T(N) = O(N 2 ) if T(N) = 3 N log(n) + 2 then T(N) = O(N log(n))

Review: The Difference Between T(N) and O(N) What do we mean by the equal sign? T(N) = O(N 2 ) Says the growth rate for T(N) is N 2. T(N) = O(N log(n)) Says T(N) has a growth rate of N log(n).

Big-O Is Worst Case Remember, Big-O says nothing about specific inputs, or specific input sizes. Suppose I give Bubble Sort the list 1, 2, 3, 4, 5» Stops right away. Fast! Suppose I give Bubble Sort the list 5, 4, 3, 2, 1» Worst case scenario! Slow. So need to calculate the max number of times code goes through a loop. e.g., if use a while loop, then the number of times code iterates should be calculated for the worst case.» By the way, a while loop is just like a for loop when calculating run time and growth rates.

Predicting How Long To Run: A Cool Application of Big-O Can predict how long it will take to run a program with a very large data set. Do a test with a small practice data set. Then use big-o to predict how long it will take with a real (large) data set. Cool! Suppose we are using a program that is O(N 2 ). That s the growth rate! 100 2 inputs per minute. Our test shows that it takes 1 minute to run 100 inputs. How long will it take to run 1000 inputs? Set it up this way: 2 100inputs 1000inputs takes 1min takes x min 2

Predicting How Much Data Will Run: A Cool Application of Big-O Can predict how much data can be processed in a fixed amount of time. Do a test with a small data set. Then predict with big-o. Suppose we are using a program that is O(N 2 ). Our test shows that it takes 1 minute to run 100 inputs. How many inputs will run in 60 minutes? Set it up this way: 2 100inputs N inputs takes 1min takes 60min 2

General O Rules: Rule 1 Rule 1 if T 1 (N) = O(f(N)) and T 2 (N) = O(g(N)) then (a) T 1 (N) + T 2 (N) = max( O(f(N)), O(g(N)) ). (b) T 1 (N) * T 2 (N) = O( f(n)*g(n) ). These are big-o rules, not run-time rules. They apply equally well to any other math class!

Example for(int i=1; i<= nnum; i++) npartialsum += i * i * i; for(int i=1; i<= nnum; i++) for(int j=1; j<= nnum; j++) npartialsum += i * j; T 1 (N) = O(N) T 2 (N) = O(N 2 ) So the total run time is T 1 (N) + T 2 (N) = max(o(n), O(N 2 )) = O(N 2 )

General O Rules: Rule 2 Rule 2 (logn) k = O(N) for any constant k. What? Remember: T(N) = O(f(N)) when T(N) cf(n). So, we re just saying that (logn) k grows more slowly than N. In other words: logarithms grow VERY slowly.

1 3 5 7 9 11 13 15 17 run time T(N) Comparison of Growth Rates 140 120 100 80 60 40 20 bad good LogN (LogN)^2 N NLogN N^2 N^3 2^N 0 N (number of inputs)

Rules For Programs Previous rules were general math rules. The following rules apply to computer programs. will still need to use the math rules!

Rules For Calculating Growth Rate: Rule 0 Rule 0: declarations, method calls, returns Zero cost. Example int myvariable; return 0; No cost. O(1). Constant growth rate. In other words, if we double the # of inputs, takes the same amount of time to run. We describe growth rates in terms of N. So, T(N) = 0 = 0 * N 0 = O(N 0 ) = O(1).

Rules For Calculating Growth Rule 1: for loops Rate: Rule 1 The growth rate of a for loop is at most the growth rate of the statements inside the for loop times the number of iterations. Example: O(N) for(i=0; i<n; i++) i++; One addition and one assignment operation Happens N/2 times! (i++ is happening two places sneaky)

Rules For Calculating Growth Rate: Rule 2 Rule 2: Nested Loops Analyze inside out. Growth rate of a statement inside nested loops is the growth rate of the statement multiplied by the product of the sizes of the loops. Example: for(int i=1; i<= num1; i++) for(int j=1; j<= num2; j++) a[i] = i * a[j]; 4 num2 num1 So total runtime = 4 * num2 * num1 = O(N 2 )

Rules For Calculating Growth Rate: Rule 3 Rule 3: Consecutive statements These just add (which means the maximum one counts). Example: for(int i=1; i<3; i++) a[i] = i; for(int i=1; i<n; i++) a[i] = i; 2 n-1 T(n) = 2+(n-1) = O(n) (just a big-o math rule!)

Rules For Calculating Growth Rate: Rule 4 Rule 4: if/else Growth rate is never more than the longest of the if or the else statements. Example if(happydog) print( bow-wow ); else if(happycat) for(int i=1; i<35; i++) print( meow ); Assume print runs in 1 time unit. 1 34 T(N) = 34 = 34 N 0 = O(N 0 ) = O(1)

Example With Recursion public long factorial(long n) if(n<=1) else return 1; return n*factorial(n-1); T(n) = 2 + Cost(factorial(n-1)) - or - T(n) = 2 + T(n-1) 0 1 for multiplication, plus 1 for subtraction, plus the cost of the evaluation of factorial(n-1) (roughly we are ignoring the n<=1 in the run time)

Example With Recursion 2 T(n) 2 2 2 2 2 2 (n -1) O(n) Cost(factorial(n -1)) (2 Cost(factorial(n - 2))) (2 (2 Cost(factorial(n -3)))) 2 2 2 0 Last case of factorial(1) doesn t cost anything. Just returns.

Example With Recursion 3 Same thing, but different notation. T(n) 2 2 2 T(n -1) (2 T(n - 2)) (2 (2 T(n -3))) 2 2 2 2 2 2 (n -1) O(n) 0

Example With Recursion 4 (in fact, was just a for loop in disguise) public long factorial(long n) if(n<=1) return 1; else return n*factorial(n-1); public long factorial(long n) 1 long factorial = 1; for(int i=1; i<=n; i++) factorial = i*factorial; return factorial; N 0 O(N) T(N) = 1+N+0 = O(N)

Another Recursion Example (Fibonacci numbers) public long fib(int n) if(n<=1) return 1; else return fib(n-1) + fib(n-2); 0 3 for, + and, and also the cost of fib(n-1)and the cost of fib(n-2) T(n) = 3 + Cost(fib(n-1)) + Cost(fib(n-2)) - or - T(n) = 3 + T(n-1) + T(n-2) (Job interviewer once asked me if this was a good way to program Fibonacci # s!)

Another Recursion Example 2 T(n) = 3 + T(n-1) + T(n-2) 3 + T(n-2) + T(n-3) 3 + T(n-3) + T(n-4) 3 + T(n-3) + T(n-4) etc. etc. etc. The cost keeps doubling in size!!! Exponential: O(2 n ) bad, bad, bad, bad, bad, bad

Tree Doubling Called a binary tree. Keeps doubling. Exponential (2 n ) growth. Bad growth rate! But we ll do some problems that traverse the tree in the other direction. Keeps halving. Opposite of exponential. And what s the inverse ( opposite ) of exponential? Logs! Logarithmic log(n) growth. Great growth rate! Stay tuned for logarithmic growth

Recursion NOT Always a For Loop in Disguise The Fibonacci recursion is O(2 N ). bad! Most for loops are O(N). But not all Stay tuned good!