Introduction to Programming (in C++) Sorting. Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. of Computer Science, UPC

Similar documents
Introduction to Programming (in C++) Multi-dimensional vectors. Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. of Computer Science, UPC

Introduction to Programming (in C++) Loops. Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. of Computer Science, UPC

Example. Introduction to Programming (in C++) Loops. The while statement. Write the numbers 1 N. Assume the following specification:

Analysis of Binary Search algorithm and Selection Sort algorithm

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

6. Standard Algorithms

Data Structures and Algorithms Lists

The Tower of Hanoi. Recursion Solution. Recursive Function. Time Complexity. Recursive Thinking. Why Recursion? n! = n* (n-1)!

Binary Heap Algorithms

Lecture Notes on Linear Search

Sorting Algorithms. Nelson Padua-Perez Bill Pugh. Department of Computer Science University of Maryland, College Park

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

4.2 Sorting and Searching

Zabin Visram Room CS115 CS126 Searching. Binary Search

recursion, O(n), linked lists 6/14

if and if-else: Part 1

Ordered Lists and Binary Trees

Binary search algorithm

Heaps & Priority Queues in the C++ STL 2-3 Trees

Data Structures and Algorithms

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

Node-Based Structures Linked Lists: Implementation

Data Structures and Algorithms

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

Biostatistics 615/815

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Recursion. Slides. Programming in C++ Computer Science Dept Va Tech Aug., Barnette ND, McQuain WD

CSC148 Lecture 8. Algorithm Analysis Binary Search Sorting

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

Data Structures and Data Manipulation

Algorithms. Margaret M. Fleck. 18 October 2010

UIL Computer Science for Dummies by Jake Warren and works from Mr. Fleming

Data Structures. Algorithm Performance and Big O Analysis

AP Computer Science AB Syllabus 1

CSC 180 H1F Algorithm Runtime Analysis Lecture Notes Fall 2015

Pseudo code Tutorial and Exercises Teacher s Version

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Computer Science 210: Data Structures. Searching

Outline. Conditional Statements. Logical Data in C. Logical Expressions. Relational Examples. Relational Operators

Questions 1 through 25 are worth 2 points each. Choose one best answer for each.

Lecture Notes on Binary Search Trees

Chapter 8: Bags and Sets

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

Simple sorting algorithms and their complexity. Bubble sort. Complexity of bubble sort. Complexity of bubble sort. Bubble sort of lists

Algorithm Analysis [2]: if-else statements, recursive algorithms. COSC 2011, Winter 2004, Section N Instructor: N. Vlajic

Arrays. number: Motivation. Prof. Stewart Weiss. Software Design Lecture Notes Arrays

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

Lecture Notes on Binary Search Trees

Sequences in the C++ STL

Loop Invariants and Binary Search

THIS CHAPTER studies several important methods for sorting lists, both contiguous

Why you shouldn't use set (and what you should use instead) Matt Austern

CS 2112 Spring Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions

Binary Multiplication

Linked Lists: Implementation Sequences in the C++ STL

Moving from CS 61A Scheme to CS 61B Java

Data Structures. Topic #12

Section IV.1: Recursive Algorithms and Recursion Trees

ESCI 386 Scientific Programming, Analysis and Visualization with Python. Lesson 5 Program Control

Converting a Number from Decimal to Binary

COMP 110 Prasun Dewan 1

2.3 WINDOW-TO-VIEWPORT COORDINATE TRANSFORMATION

Sequential Data Structures

Introduction to Programming System Design. CSCI 455x (4 Units)

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

Data Structures and Algorithms Written Examination

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

Chapter 7: Sequential Data Structures

Sample Questions Csci 1112 A. Bellaachia

Boolean Expressions, Conditions, Loops, and Enumerations. Precedence Rules (from highest to lowest priority)

Functions Recursion. C++ functions. Declare/prototype. Define. Call. int myfunction (int ); int myfunction (int x){ int y = x*x; return y; }

Computational Mathematics with Python

Analysis of a Search Algorithm

ECE 250 Data Structures and Algorithms MIDTERM EXAMINATION /5:15-6:45 REC-200, EVI-350, RCH-106, HH-139

Data Structures Fibonacci Heaps, Amortized Analysis

Symbol Tables. Introduction

Lecture 3: Finding integer solutions to systems of linear equations

Data Structures, Practice Homework 3, with Solutions (not to be handed in)

Data Structures. Jaehyun Park. CS 97SI Stanford University. June 29, 2015

Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new

Practical Session 4 Java Collections

Binary Search Trees. basic implementations randomized BSTs deletion in BSTs

TREE BASIC TERMINOLOGIES

Data Structure and Algorithm I Midterm Examination 120 points Time: 9:10am-12:10pm (180 minutes), Friday, November 12, 2010

Outline. Introduction Linear Search. Transpose sequential search Interpolation search Binary search Fibonacci search Other search techniques

General Sampling Methods

Gravity Forms: Creating a Form

Computational Mathematics with Python

Object-Oriented Design Lecture 4 CSU 370 Fall 2007 (Pucella) Tuesday, Sep 18, 2007

Chapter Objectives. Chapter 9. Sequential Search. Search Algorithms. Search Algorithms. Binary Search

IVR Studio 3.0 Guide. May Knowlarity Product Team

Basic Programming and PC Skills: Basic Programming and PC Skills:

Transcription:

Introduction to Programming (in C++) Sorting Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. of Computer Science, UPC

Sorting Let elem be a type with a operation, which is a total order A vector<elem> v is (increasingly) sorted if for all i with 0 i v.size()-1, v[i] v[i+1] Equivalently: if i j then v[i] v[j] A fundamental, very common problem: sort v Order the elements in v and leave the result in v Introduction to Programming Dept. CS, UPC 2

Sorting 9-7 0 1-3 4 3 8-6 8 6 2-7 -6-3 0 1 2 3 4 6 8 8 9 Another common task: sort v[a..b] a b 9-7 0 1-3 4 3 8-6 8 6 2 a 9-7 0-3 1 3 4 8-6 8 6 2 b Introduction to Programming Dept. CS, UPC 3

Sorting We will look at four sorting algorithms: Selection Sort Insertion Sort Bubble Sort Merge Sort Let us consider a vector v of n elems (n = v.size()) Insertion, Selection and Bubble Sort make a number of operations on elems proportional to n 2 Merge Sort is proportional to n log 2 n: faster except for very small vectors Introduction to Programming Dept. CS, UPC 4

Selection Sort Observation: in the sorted vector, v[0] is the smallest element in v The second smallest element in v must go to v[1] and so on At the i-th iteration, select the i-th smallest element and place it in v[i] Introduction to Programming Dept. CS, UPC 5

Selection Sort From http://en.wikipedia.org/wiki/selection_sort Introduction to Programming Dept. CS, UPC 6

Selection Sort Selection sort keeps this invariant: i-1-7 -3 0 1 4 9?????? i this is sorted and contains the i-1 smallest elements this may not be sorted but all elements here are larger than or equal to the elements in the sorted part Introduction to Programming Dept. CS, UPC 7

Selection Sort // Pre: -- // Post: v is now increasingly sorted void selection_sort(vector<elem>& v) { int last = v.size() - 1; for (int i = 0; i < last; ++i) { int k = pos_min(v, i, last); swap(v[k], v[i]); // Invariant: v[0..i-1] is sorted and // if a < i <= b then v[a] <= v[b] Note: when i=v.size()-1, v[i] is necessarily the largest element. Nothing to do. Introduction to Programming Dept. CS, UPC 8

Selection Sort // Pre: 0 <= left <= right < v.size() // Returns pos such that left <= pos <= right // and v[pos] is smallest in v[left..right] int pos_min(const vector<elem>& v, int left, int right) { int pos = left; for (int i = left + 1; i <= right; ++i) { if (v[i] < v[pos]) pos = i; return pos; Introduction to Programming Dept. CS, UPC 9

Selection Sort At the i-th iteration, Selection Sort makes up to v.size()-1-i comparisons among elems 1 swap (=3 elem assignments) per iteration The total number of comparisons for a vector of size n is: (n-1)+(n-2)+ +1= n(n-1)/2 n 2 /2 The total number of assignments is 3(n-1). Introduction to Programming Dept. CS, UPC 10

Let us use induction: Insertion Sort If we know how to sort arrays of size n-1, do we know how to sort arrays of size n? 0 n-2 n-1 9-7 0 1-3 4 3 8-6 8 6 2-7 -6-3 0 1 3 4 6 8 8 9 2-7 -6-3 0 1 2 3 4 6 8 8 9 Introduction to Programming Dept. CS, UPC 11

Insertion Sort Insert x=v[n-1] in the right place in v[0..n-1] Two ways: - Find the right place, then shift the elements - Shift the elements to the right until one x is found Introduction to Programming Dept. CS, UPC 12

Insertion Sort Insertion sort keeps this invariant: i-1-7 -3 0 1 4 9?????? i This is sorted This may not be sorted and we have no idea of what may be here Introduction to Programming Dept. CS, UPC 13

Insertion Sort From http://en.wikipedia.org/wiki/insertion_sort Introduction to Programming Dept. CS, UPC 14

Insertion Sort // Pre: -- // Post: v is now increasingly sorted void insertion_sort(vector<elem>& v) { for (int i = 1; i < v.size(); ++i) { elem x = v[i]; int j = i; while (j > 0 and v[j - 1] > x) { v[j] = v[j - 1]; --j; v[j] = x; // Invariant: v[0..i-1] is sorted in ascending order Introduction to Programming Dept. CS, UPC 15

Insertion Sort At the i-th iteration, Insertion Sort makes up to i comparisons and up to i+2 assignments of type elem The total number of comparisons for a vector of size n is, at most: 1 + 2 + + (n-1) = n(n-1)/2 n 2 /2 At the most, n 2 /2 assignments But about n 2 /4 in typical cases Introduction to Programming Dept. CS, UPC 16

Selection Sort vs. Insertion Sort 2-1 5 0-3 9 4-3 -1 5 0 2 9 4-3 -1 5 0 2 9 4-3 -1 0 5 2 9 4-3 -1 0 2 5 9 4-3 -1 0 2 4 9 5-3 -1 0 2 4 5 9 2-1 5 0-3 9 4-1 2 5 0-3 9 4-1 2 5 0-3 9 4-1 0 2 5-3 9 4-3 -1 0 2 5 9 4-3 -1 0 2 5 9 4-3 -1 0 2 4 5 9 Introduction to Programming Dept. CS, UPC 17

Selection Sort vs. Insertion Sort Introduction to Programming Dept. CS, UPC 18

Evaluation of complex conditions void insertion_sort(vector<elem>& v) { for (int i = 1; i < v.size(); ++i) { elem x = v[i]; int j = i; while (j > 0 and v[j - 1] > x) { v[j] = v[j - 1]; --j; v[j] = x; How about: while (v[j 1] > x and j > 0)? Consider the case for j = 0 evaluation of v[-1] (error!) How are complex conditions really evaluated? Introduction to Programming Dept. CS, UPC 19

Evaluation of complex conditions Many languages (C, C++, Java, PHP, Python) use the short-circuit evaluation (also called minimal or lazy evaluation) for Boolean operators. For the evaluation of the Boolean expression expr1 op expr2 expr2 is only evaluated if expr1 does not suffice to determine the value of the expression. Example: (j > 0 and v[j-1] > x) v[j-1] is only evaluated when j>0 Introduction to Programming Dept. CS, UPC 20

Evaluation of complex conditions In the following examples: n!= 0 and sum/n > avg n == 0 or sum/n > avg sum/n will never execute a division by zero. Not all languages have short-circuit evaluation. Some of them have eager evaluation (all the operands are evaluated) and some of them have both. The previous examples could potentially generate a runtime error (division by zero) when eager evaluation is used. Tip: short-circuit evaluation helps us to write more efficient programs, but cannot be used in all programming languages. Introduction to Programming Dept. CS, UPC 21

Bubble Sort A simple idea: traverse the vector many times, swapping adjacent elements when they are in the wrong order. The algorithm terminates when no changes occur in one of the traversals. Introduction to Programming Dept. CS, UPC 22

Bubble Sort Introduction to Programming Dept. CS, UPC 23 3 0 5 1 4 2 0 3 5 1 4 2 0 3 5 1 4 2 0 3 1 5 4 2 0 3 1 4 5 2 0 3 1 4 2 5 0 3 1 4 2 5 0 1 3 4 2 5 0 1 3 4 2 5 0 1 3 2 4 5 0 1 3 2 4 5 0 1 3 2 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 The largest element is well-positioned after the first iteration. The second largest element is well-positioned after the second iteration. The vector is sorted when no changes occur during one of the iterations.

Bubble Sort From http://en.wikipedia.org/wiki/bubble_sort Introduction to Programming Dept. CS, UPC 24

Bubble Sort void bubble_sort(vector<elem>& v) { bool sorted = false; int last = v.size() 1; while (not sorted) { // Stop when no changes sorted = true; for (int i = 0; i < last; ++i) { if (v[i] > v[i + 1]) { swap(v[i], v[i + 1]); sorted = false; // The largest element falls to the bottom --last; Observation: at each pass of the algorithm, all elements after the last swap are sorted. Introduction to Programming Dept. CS, UPC 25

Bubble Sort void bubble_sort(vector<elem>& v) { int last = v.size() 1; while (last > 0) { int last_swap = 0; // Last swap at each iteration for (int i = 0; i < last; ++i) { if (v[i] > v[i + 1]) { swap(v[i], v[i + 1]); last_swap = i; last = last_swap; // Skip the sorted tail Introduction to Programming Dept. CS, UPC 26

Bubble Sort Worst-case analysis: The first pass makes n-1 swaps The second pass makes n-2 swaps The last pass makes 1 swap The worst number of swaps: 1 + 2 + + (n-1) = n(n-1)/2 n 2 /2 It may be efficient for nearly-sorted vectors. In general, bubble sort is one of the least efficient algorithms. It is not practical when the vector is large. Introduction to Programming Dept. CS, UPC 27

Merge Sort Recall our induction for Insertion Sort: suppose we can sort vectors of size n-1, can we now sort vectors of size n? What about the following: suppose we can sort vectors of size n/2, can we now sort vectors of size n? Introduction to Programming Dept. CS, UPC 28

Merge Sort 9-7 0 1-3 4 3 8-6 8 6 2 Induction! -7-3 0 1 4 9 3 8-6 8 6 2 Induction! -7-3 0 1 4 9-6 2 3 6 8 8 How do we do this? -7-6 -3 0 1 2 3 4 6 8 8 9 Introduction to Programming Dept. CS, UPC 29

Merge Sort From http://en.wikipedia.org/wiki/merge_sort Introduction to Programming Dept. CS, UPC 30

Merge Sort We have seen almost what we need! // Pre: A and B are sorted in ascending order // Returns the sorted fusion of A and B vector<elem> merge(const vector<elem>& A, const vector<elem>& B); Now, v[0..n/2-1] and v[n/2..n-1] are sorted in ascending order. Merge them into an auxiliary vector of size n, then copy back to v. Introduction to Programming Dept. CS, UPC 31

Merge Sort 9-7 0 1 4-3 3 8 9-7 0 1 Split 4-3 3 8 Merge Sort Merge Sort -7 0 1 9 Merge -3 3 4 8-7 -3 0 1 3 4 8 9 Introduction to Programming Dept. CS, UPC 32

Merge Sort // Pre: 0 <= left <= right < v.size() // Post: v[left..right] has been sorted increasingly void merge_sort(vector<elem>& v, int left, int right) { if (left < right) { int m = (left + right)/2; merge_sort(v, left, m); merge_sort(v, m + 1, right); merge(v, left, m, right); Introduction to Programming Dept. CS, UPC 33

Merge Sort merge procedure // Pre: 0 <= left <= mid < right < v.size(), and // v[left..mid], v[mid+1..right] are both sorted increasingly // Post: v[left..right] is now sorted void merge(vector<elem>& v, int left, int mid, int right) { int n = right - left + 1; vector<elem> aux(n); int i = left; int j = mid + 1; int k = 0; while (i <= mid and j <= right) { if (v[i] <= v[j]) { aux[k] = v[i]; ++i; else { aux[k] = v[j]; ++j; ++k; while (i <= mid) { aux[k] = v[i]; ++k; ++i; while (j <= right) { aux[k] = v[j]; ++k; ++j; for (k = 0; k < n; ++k) v[left+k] = aux[k]; Introduction to Programming Dept. CS, UPC 34

Merge Sort : merge_sort : merge 9-7 0 1 4-3 3 8 9-7 0 1 4-3 3 8 9-7 0 1 4-3 3 8 9-7 0 1 4-3 3 8-7 9 0 1-3 4 3 8-7 0 1 9-3 3 4 8-7 -3 0 1 3 4 8 9 Introduction to Programming Dept. CS, UPC 35

Merge Sort How many elem comparisons does Merge Sort do? Say v.size() is n, a power of 2 merge(v,l,m,r) makes k comparisons if k=r-l+1 We call merge n 2i times with R-L=2i The total number of comparisons is log 2 n n i=1 2 i 2i = n log 2 n The total number of elem assignments is 2n log 2 n Introduction to Programming Dept. CS, UPC 36

Comparison of sorting algorithms Selection Insertion Bubble Merge Introduction to Programming Dept. CS, UPC 37

Comparison of sorting algorithms Approximate number of comparisons: n = v.size() 10 100 1,000 10,000 100,000 Insertion, Selection and Bubble Sort ( n 2 /2) Merge Sort ( n log 2 n) 50 5,000 500,000 50,000,000 5,000,000,000 67 1,350 20,000 266,000 3,322,000 Note: it is known that every general sorting algorithm must do at least n log 2 n comparisons. Introduction to Programming Dept. CS, UPC 38

Comparison of sorting algorithms 100 Execution time (µs) 80 60 Insertion Sort Selection Sort Bubble Sort Merge Sort 40 20 For small vectors 0 20 40 60 80 100 120 140 160 180 200 Vector size Introduction to Programming Dept. CS, UPC 39

Thousands Comparison of sorting algorithms 2.5 Execution time (ms) 2 1.5 Insertion Sort Selection Sort Bubble Sort Merge Sort 1 0.5 For medium vectors 0 100 200 300 400 500 600 700 800 900 1000 Vector size Introduction to Programming Dept. CS, UPC 40

Comparison of sorting algorithms 80 70 60 50 40 30 20 Execution time (secs) Insertion Sort Selection Sort Bubble Sort Merge Sort For large vectors 10 0 10K 20K 30K 40K 50K 60K 70K 80K 90K 100K Vector size Introduction to Programming Dept. CS, UPC 41

Other sorting algorithms There are many other sorting algorithms. The most efficient algorithm for general sorting is quick sort (C.A.R. Hoare). The worst case is proportional to n 2 The average case is proportional to n log 2 n, but it usually runs faster than all the other algorithms It does not use any auxiliary vectors Quick sort will not be covered in this course. Introduction to Programming Dept. CS, UPC 42

Sorting with the C++ library A sorting procedure is available in the C++ library It probably uses a quicksort algorithm To use it, include: #include <algorithm> To increasingly sort a vector v (of int s, double s, string s, etc.), call: sort(v.begin(), v.end()); Introduction to Programming Dept. CS, UPC 43

Sorting with the C++ library To sort with a different comparison criteria, call sort(v.begin(), v.end(), comp); For example, to sort int s decreasingly, define: bool comp(int a, int b) { return a > b; To sort people by age, then by name: bool comp(const Person& a, const Person& b) { if (a.age == b.age) return a.name < b.name; else return a.age < b.age; Introduction to Programming Dept. CS, UPC 44

Sorting is not always a good idea Example: to find the min value of a vector min = v[0]; for (int i=1; i < v.size(); ++i) if (v[i] < min) min = v[i]; sort(v); min = v[0]; (1) (2) Efficiency analysis: Option (1): n iterations (visit all elements). Option (2): 2n log 2 n moves with a good sorting algorithm (e.g., merge sort) Introduction to Programming Dept. CS, UPC 45