Basic Algorithm Analysis

Similar documents
Sorting Algorithms. Nelson Padua-Perez Bill Pugh. Department of Computer Science University of Maryland, College Park

6. Standard Algorithms

Binary Heap Algorithms

Biostatistics 615/815

Analysis of Binary Search algorithm and Selection Sort algorithm

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

Heaps & Priority Queues in the C++ STL 2-3 Trees

Converting a Number from Decimal to Binary

Questions 1 through 25 are worth 2 points each. Choose one best answer for each.

Symbol Tables. Introduction

Zabin Visram Room CS115 CS126 Searching. Binary Search

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

Binary search algorithm

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015

Algorithms. Margaret M. Fleck. 18 October 2010

CSC148 Lecture 8. Algorithm Analysis Binary Search Sorting

HW4: Merge Sort. 1 Assignment Goal. 2 Description of the Merging Problem. Course: ENEE759K/CMSC751 Title:

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

Binary Heaps. CSE 373 Data Structures

Big Data and Scripting. Part 4: Memory Hierarchies

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

APP INVENTOR. Test Review

What Is Recursion? Recursion. Binary search example postponed to end of lecture

Class Overview. CSE 326: Data Structures. Goals. Goals. Data Structures. Goals. Introduction

Lecture 6: Binary Search Trees CSCI Algorithms I. Andrew Rosenberg

Sample Questions Csci 1112 A. Bellaachia

Persistent Binary Search Trees

Data Structures and Data Manipulation

Data Structures. Algorithm Performance and Big O Analysis

Why Use Binary Trees?

Lecture Notes on Binary Search Trees

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

The Tower of Hanoi. Recursion Solution. Recursive Function. Time Complexity. Recursive Thinking. Why Recursion? n! = n* (n-1)!

Quick Sort. Implementation

- Easy to insert & delete in O(1) time - Don t need to estimate total memory needed. - Hard to search in less than O(n) time

Class : MAC 286. Data Structure. Research Paper on Sorting Algorithms

6 March Array Implementation of Binary Trees

DATA STRUCTURES USING C

Data Structure [Question Bank]

Chapter 7: Sequential Data Structures

Lecture Notes on Binary Search Trees

TREE BASIC TERMINOLOGIES

Data Structures Fibonacci Heaps, Amortized Analysis

Algorithm Analysis [2]: if-else statements, recursive algorithms. COSC 2011, Winter 2004, Section N Instructor: N. Vlajic

Data Structures. Jaehyun Park. CS 97SI Stanford University. June 29, 2015

Chapter 13: Query Processing. Basic Steps in Query Processing

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.

Computer Science 210: Data Structures. Searching

Introduction to Algorithms March 10, 2004 Massachusetts Institute of Technology Professors Erik Demaine and Shafi Goldwasser Quiz 1.

CSC 180 H1F Algorithm Runtime Analysis Lecture Notes Fall 2015

Lecture Notes on Linear Search

Node-Based Structures Linked Lists: Implementation

Ordered Lists and Binary Trees

root node level: internal node edge leaf node Data Structures & Algorithms McQuain

Cpt S 223. School of EECS, WSU

For example, we have seen that a list may be searched more efficiently if it is sorted.

Recursion vs. Iteration Eliminating Recursion

Outline. Introduction Linear Search. Transpose sequential search Interpolation search Binary search Fibonacci search Other search techniques

Exam study sheet for CS2711. List of topics

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case.

Quiz 4 Solutions EECS 211: FUNDAMENTALS OF COMPUTER PROGRAMMING II. 1 Q u i z 4 S o l u t i o n s

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D.

Why? A central concept in Computer Science. Algorithms are ubiquitous.

In mathematics, it is often important to get a handle on the error term of an approximation. For instance, people will write

Data Structures. Level 6 C Module Descriptor

4.2 Sorting and Searching

From Last Time: Remove (Delete) Operation

Introduction to Parallel Programming and MapReduce

PES Institute of Technology-BSC QUESTION BANK

recursion, O(n), linked lists 6/14

Algorithms and Data Structures

Union-Find Algorithms. network connectivity quick find quick union improvements applications

Sequential Data Structures

Binary Search Trees CMPSC 122

AP Computer Science AB Syllabus 1

A COMPARATIVE STUDY OF LINKED LIST SORTING ALGORITHMS

THIS CHAPTER studies several important methods for sorting lists, both contiguous

Chapter Objectives. Chapter 9. Sequential Search. Search Algorithms. Search Algorithms. Binary Search

Introduction to Programming (in C++) Sorting. Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. of Computer Science, UPC

The Goldberg Rao Algorithm for the Maximum Flow Problem

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

Loop Invariants and Binary Search

AP Computer Science Java Mr. Clausen Program 9A, 9B

Analysis of a Search Algorithm

Data Structures and Algorithms

Rotation Operation for Binary Search Trees Idea:

Pseudo code Tutorial and Exercises Teacher s Version

Introduction to Programming System Design. CSCI 455x (4 Units)

The Running Time of Programs

Sequences in the C++ STL

Class Notes CS Creating and Using a Huffman Code. Ref: Weiss, page 433

CSE 326: Data Structures B-Trees and B+ Trees

DATABASE DESIGN - 1DL400

Transcription:

Basic Algorithm Analysis There are often many ways to solve the same problem. We want to choose an efficient algorithm. How do we measure efficiency? Time? Space? We don t want to redo our analysis every time we reimplement our algorithm, or use a different machine. Searching and Sorting 1 37 Howard Cheng

Basic Algorithm Analysis To analyze the amount of time required, we count the number of operations needed. But not all operations require the same amount of time. Different compiler/machine may translate the same C++ code to different number/type of instructions. Instead, we pick one representative operation and count that (e.g. comparisons, variable assignments). The idea is that the total amount of time is proportional to the number of representative operations. The constant of proportionality depends on compiler, machine, etc. That is, the analysis is dependent only on the algorithm, not on the implementation. Searching and Sorting 2 37 Howard Cheng

Big-O Notation We express the number of operations in big-o notation. The number of operations depends on the size of input (e.g. number of array elements). Usually denoted n. Roughly, big-o means proportional to for large enough inputs. e.g. O(n) means run time is proportional to input size. Doubling the input means roughly doubling the running time. e.g. O(n 2 ) means run time is proportional to the square of input size. Doubling the input means roughly quadrupling the running time. We want an algorithm with a small big-o. Run time is usually called the (time) complexity. Searching and Sorting 3 37 Howard Cheng

Worst case vs. Average Case Some algorithms behave differently depending on input. For example, some sorting algorithms will be very fast if the input is already sorted, but they may be much slower on other cases. We can talk about the worst case complexity: how fast will the algorithm run regardless of what input it receives? We can also talk about average case complexity: how does it do on average? Average case is hard: we need to know what average means. We will concentrate on worst case complexity. Searching and Sorting 4 37 Howard Cheng

Searching: Linear Search int find(int A[], int n, int key) { for (int i = 0; i < n; i++) if (A[i] == key) return i; return -1; // not found } In the worst case (key is in the last position/not found), we need to do n comparisons. The algorithm has O(n) complexity. Searching and Sorting 5 37 Howard Cheng

Searching: Binary Search int find(int A[], int n, int key) { int lo = 0, hi = n-1; while (lo <= hi) { int mid = (lo + hi) / 2; if (A[mid] == key) return mid; else if (key < A[mid]) hi = mid-1; else lo = mid+1; } return -1; // not found } Searching and Sorting 6 37 Howard Cheng

Searching: Binary Search The array must be sorted. The worst case happens if the key is not found. Every time the size of the range [lo,hi] is reduced by at least half. That means it takes approximate log 2 n iterations to reduce [0,n-1] to the empty range. Each iteration does a constant amount of work, so the algorithm has O(log 2 n) complexity. We usually write O(log n) because logarithms of different bases are related by a constant: log 2 n = logn/log2 Searching and Sorting 7 37 Howard Cheng

Complexity: Does It Matter? For large n, the difference between O(n) and O(log 2 n) is big. The number of iterations: n O(n) O(log 2 n) 32 32 5 1024 1024 10 1048576 1048576 20 Searching and Sorting 8 37 Howard Cheng

Sorting: Selection Sort for (int i = 0; i < n; i++) { int index = find_min(a, i, n); swap(a[i], A[index]); } Finding the minimum on n elements requires O(n) comparisons. With n iterations this becomes O(n 2 ). More precisely, the i-th iteration requires about n i operations, so the total is: n+(n 1)+(n 2)+ +2+1 = n(n+1) 2 = O(n 2 ) O(n 2 ) is slow. Sorting 10 6 entries requires roughly 10 12 operations! Searching and Sorting 9 37 Howard Cheng

Other Slow Sorting Algorithms The following common sorting algorithms are also O(n 2 ): Bubble sort Insertion sort But these algorithms can be O(n) if the array is already sorted. Selection sort is always O(n 2 ) even if the array is already sorted. Searching and Sorting 10 37 Howard Cheng

Merge Sort This is our first fast sorting algorithm. The basic idea is this: If the array has only one element, it is easy. Otherwise, split the array into two halves. Recursively sort each half. Merge the two sorted lists together. The only real work is in the merging. Searching and Sorting 11 37 Howard Cheng

Merge Sort void mergesort(int A[], int start, int end) { if (end - start > 1) { int mid = (start+end)/2; mergesort(a, start, mid); mergesort(a, mid, end); merge(a, start, mid, end); } } Searching and Sorting 12 37 Howard Cheng

Merging Merging is done with a marching algorithm : void merge(int A[], int start, int mid, int end) { int R[SIZE], i1, i2, j; i1 = start; i2 = mid; j = 0; while (i1 < mid && i2 < end) { if (A[i1] < A[i2]) // A[i1] comes next R[j++] = A[i1++]; else R[j++] = A[i2++]; } copy(a+i1, A+mid, R+j); copy(a+i2, A+end, R+j+(mid-i1)); copy(r, R+(end-start), A+start); } Searching and Sorting 13 37 Howard Cheng

Merging Notice that each loop iteration copies one element from either half. The two copy() s merge the leftovers. Each element is copied once into R and once back into A. So merging has complexity O(end start). Notice that we need an auxillary array. Searching and Sorting 14 37 Howard Cheng

Merge Sort: Complexity At the top level, merging takes O(n) operations. At the next level, merging takes O(n/2) operations on each half, but there are two halves. Total: O(n). The next level works on quarters: O(n/4) for merging, but there are 4 quarters. Total: O(n). Every level requires O(n) work. How many levels are there? Each level the size is reduced by half: O(log 2 n) levels Total complexity: O(n log n). Searching and Sorting 15 37 Howard Cheng

Quicksort A very common fast sorting algorithm. It is commonly considered to be one of the fastest algorithms. Same complexity as merge sort on average, but proportionality constant is much smaller. Requires relatively little extra space. Like mergesort, quicksort is recursive. Searching and Sorting 16 37 Howard Cheng

Quicksort The idea: If there is one element or less, the array is sorted. Otherwise, choose a pivot element. Partition the array so all elements to the left of the pivot are no bigger than the pivot, and all elements to the right are larger. Recurse on the subarrays to the left and right of the pivot. The only real work is done in the partitioning. Searching and Sorting 17 37 Howard Cheng

Quicksort void quicksort(int A[], int start, int end) { if (end - start > 1) { int pivot = partition(a, start, end); quicksort(a, start, pivot); quicksort(a, pivot+1, end); } } Searching and Sorting 18 37 Howard Cheng

Partitioning The pivot can be any element in the array. It is easiest to choose the first one (A[start]) One way: scan from left to right, and maintain two indices: i and j so that A[start+1..i) are A[start] A[i,j) are > A[start] Initially, i = j = start+1. We go through j = start+1 to end. If A[j] > A[start], just increment j. Otherwise, swap A[j] and A[i] and increment both i and j. At the end, swap A[start] and A[i-1]. Searching and Sorting 19 37 Howard Cheng

Partitioning int partition(int A[], int start, int end) { int i, j; i = start+1; for (j = start+1; j < end; j++) if (A[j] <= A[start]) swap(a[i++], A[j]); swap(a[start], A[i-1]); return i-1; // return where the pivot is } Searching and Sorting 20 37 Howard Cheng

Complexity The sizes of the two subarrays depend on the choice of pivot. If we are lucky, the two subarrays are roughly half the size of the original. Since partition takes O(n) operations, quicksort has O(n log n) complexity (same reasoning as merge sort). If we are unlucky, the pivot is the smallest/largest element. Since we reduce the size by 1, we need O(n) levels and the complexity is O(n 2 ). The worst case of quicksort happens when the array is already sorted (or sorted in reverse)! On average, the complexity is O(nlogn). Searching and Sorting 21 37 Howard Cheng

Another Partitioning Algorithm We can move from both ends: int partition(int A[], int start, int end) { int pivot = A[start]; int i = start+1, j = end-1; while (i <= j) { while (A[i] <= pivot) i++; while (pivot < A[j]) j--; if (i < j) swap(a[i], A[j]); } swap(a[start], A[j]); return j; } This is slightly more efficient but more difficult to get correctly (still O(n)). Searching and Sorting 22 37 Howard Cheng

Another Partitioning Algorithm Did you see the bug(s)? Searching and Sorting 23 37 Howard Cheng

Optimizations Instead of choosing the first element as pivot, we can choose a random element. It is unlikely to be the smallest/largest. Another common approach: pick three elements and use the middle one as pivot. Fat pivot : partition array into three subarrays: less than the pivot, equal to the pivot, and greater than the pivot. If there are many equal elements, the left and right subarrays are smaller. Recursion has overhead (function calls): it is worthwhile only for large arrays. When the size of subarray is small enough, use a different sorting algorithm (e.g. insertion sort). The cutoff point to switch between algorithms depends on compiler, machine, etc. It needs to be tuned experimentally. Searching and Sorting 24 37 Howard Cheng

Generic Sorting The STL sorting algorithms are generic : it works as long as operator< is defined on the elements (or with a supplied comparison function). Also, it uses the standard STL iterators to specify the range (random access iterators). Our code can be easily adapted for templates: Instead of comparing by <=, use OR of < and == (or use less_equal from <functional>). Since the start and end iterators are random access, we can use iterator arithmetic to jump to particular elements in the range. Searching and Sorting 25 37 Howard Cheng

Data Structure: Heaps A heap is a binary tree with the following properties: each element in the heap is its two children (if they exist); the tree is completely filled on all levels except possibly the lowest, which is filled from the left. Because of this property, we can represent a heap of n elements in an array of size n, such that: The two children of node i is in positions 2i+1 and 2i+2. The parent of node i is in position (i 1)/2. Note: the word heap is also used to describe dynamically allocated memory. This is a completely different use of this word. Searching and Sorting 26 37 Howard Cheng

Heaps: Sifting Up Suppose we have a heap A of size n 1 and we want to add an extra element to A[n-1]. We can fix up the heap by sifting up : void sift_up(int A[], int n) { int i; for (i = n-1; i > 0 && A[i] > A[(i-1)/2]; i = (i-1)/2) swap(a[i], A[(i-1)/2]); } In other words, if A[i] is larger than its parent, we swap A[i] with its parent and repeat. Searching and Sorting 27 37 Howard Cheng

Heaps: Sifting Up The sifting up procedure is correct, because: We assume that the array has the heap property except possibly with A[i] and its parent (true at the beginning). At each iteration, if the heap property is violated, we swap A[i] with its parent. Otherwise, we are done. If A[i] > A[(i-1)/2], then after swapping A[(i-1)/2] is still its children. We can also show that after swapping A[i] is still its children (we won t prove this formally). Searching and Sorting 28 37 Howard Cheng

Heaps: Sifting Up The complexity is the height of the tree, which is O(logn). To build a heap, we start from a heap of size 1 and repeatedly sift up additional elements: for (i = 2; i <= n; i++) sift_up(a, i); This has complexity O(n log n). More precisely, it is O(log2+log3+ +logn), which simplifies to O(nlogn), but some math is needed to understand this... Searching and Sorting 29 37 Howard Cheng

Heaps: Sifting Down Suppose that we change the first element A[0] in a heap. Then the heap property is preserved except perhaps between A[0] and its children (i.e. A[0] is too small). We can fix it by sifting down : swap it with the larger of the two children. Because we choose the larger child, the heap property is restored except for the heap rooted at the swapped child. Move down and repeat. Again, the complexity of each sifting down operation is O(log n). Searching and Sorting 30 37 Howard Cheng

Heaps: Sifting Down void sift_down(int A[], int n) { int i = 0; while (2*i+1 < n) { // there is a child int child = 2*i+1; // left child if (child+1 < n && A[child] < A[child+1]) child++; // check right child, if it exists if (A[i] >= A[child]) break; // heap property is good swap(a[i], A[child]); i = child; } } Searching and Sorting 31 37 Howard Cheng

Heaps: Extracting the Maximum The maximum is at the root of the heap. To remove the maximum, just put A[n-1] into the root and sift down (after decrecmenting n). Complexity is O(log n). Searching and Sorting 32 37 Howard Cheng

Priority Queues Heaps can be used to implement priority queues. Insertion has O(log n) complexity. Deleting the maximum has O(log n) complexity. Finding the maximum element (without deleting it) has O(1) complexity (i.e. constant time). Searching and Sorting 33 37 Howard Cheng

Heap Sort A heap can be used to sort an array of n elements as follows. Build a heap with the n elements: O(nlogn). Repeatedly extract the maximum of the heap and put the maximum at the correct spot. Decrease the size of the heap by 1. O(logn) each iterations gives O(n log n). At any time, the first portion of the array is a heap, the second portion is the sorted list. Complexity is O(n log n). Searching and Sorting 34 37 Howard Cheng

Heap Sort void heapsort(int A[], int n) { // build the heap for (int i = 2; i <= n; i++) sift_up(a, i); } for (int i = n-1; i >= 1; i--) { swap(a[0], A[i]); sift_down(a, i); } Searching and Sorting 35 37 Howard Cheng

Sorting: Can we do better? The fastest sorting algorithm we have seen has O(n log n) complexity. If the data to be sorted is special, we can do better. e.g. if we are sorting 0 and 1 s, we can just count the number of 0 s and number of 1 s. This has O(n) complexity. Sorting by counting is fast as long as the number of different values is small. We cannot do better than O(n) because we have to look at every array element. Searching and Sorting 36 37 Howard Cheng

Sorting: Can we do better? It can be proved that if you sort by comparing elements, it is impossible to have an algorithm that has a better worst case complexity than O(nlogn) (take CS 3620 if you want a precise statement). It does not matter how smart you are. Impossible means impossible. It means that if you have any sorting algorithm which does better than O(nlogn) in the worst case, either: the complexity analysis is wrong, or there is at least one input array on which your algorithm gives the wrong answer. Searching and Sorting 37 37 Howard Cheng