Biostatistics 615/815



Similar documents
Sorting Algorithms. Nelson Padua-Perez Bill Pugh. Department of Computer Science University of Maryland, College Park

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

The Tower of Hanoi. Recursion Solution. Recursive Function. Time Complexity. Recursive Thinking. Why Recursion? n! = n* (n-1)!

6. Standard Algorithms

Analysis of Binary Search algorithm and Selection Sort algorithm

Binary Heap Algorithms

Recursion and Dynamic Programming. Biostatistics 615/815 Lecture 5

Questions 1 through 25 are worth 2 points each. Choose one best answer for each.

CSC148 Lecture 8. Algorithm Analysis Binary Search Sorting

HW4: Merge Sort. 1 Assignment Goal. 2 Description of the Merging Problem. Course: ENEE759K/CMSC751 Title:

2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8]

TREE BASIC TERMINOLOGIES

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

Data Structures and Data Manipulation

Converting a Number from Decimal to Binary

Algorithms. Margaret M. Fleck. 18 October 2010

Analysis of a Search Algorithm

What Is Recursion? Recursion. Binary search example postponed to end of lecture

Algorithm Analysis [2]: if-else statements, recursive algorithms. COSC 2011, Winter 2004, Section N Instructor: N. Vlajic

Basic Programming and PC Skills: Basic Programming and PC Skills:

4.2 Sorting and Searching

Closest Pair Problem

Introduction to Programming (in C++) Sorting. Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. of Computer Science, UPC

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D.

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Symbol Tables. Introduction

Figure 1: Graphical example of a mergesort 1.

Recursion. Slides. Programming in C++ Computer Science Dept Va Tech Aug., Barnette ND, McQuain WD

Introduction to Algorithms March 10, 2004 Massachusetts Institute of Technology Professors Erik Demaine and Shafi Goldwasser Quiz 1.

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

Class : MAC 286. Data Structure. Research Paper on Sorting Algorithms

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

Many algorithms, particularly divide and conquer algorithms, have time complexities which are naturally

CS473 - Algorithms I

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

Dynamic Programming. Lecture Overview Introduction

Heaps & Priority Queues in the C++ STL 2-3 Trees

Data Structures and Algorithms Written Examination

Partitioning and Divide and Conquer Strategies

Memory Allocation. Static Allocation. Dynamic Allocation. Memory Management. Dynamic Allocation. Dynamic Storage Allocation

Now is the time. For all good men PERMUTATION GENERATION. Princeton University. Robert Sedgewick METHODS

Introduction to Parallel Programming and MapReduce

Pseudo code Tutorial and Exercises Teacher s Version

A Static Analyzer for Large Safety-Critical Software. Considered Programs and Semantics. Automatic Program Verification by Abstract Interpretation

28 Closest-Point Problems

Introduction to Data Structures

Data Structures. Algorithm Performance and Big O Analysis

To My Parents -Laxmi and Modaiah. To My Family Members. To My Friends. To IIT Bombay. To All Hard Workers

1. What are Data Structures? Introduction to Data Structures. 2. What will we Study? CITS2200 Data Structures and Algorithms

A COMPARATIVE STUDY OF LINKED LIST SORTING ALGORITHMS

THIS CHAPTER studies several important methods for sorting lists, both contiguous

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

Stacks. Linear data structures

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

Union-Find Algorithms. network connectivity quick find quick union improvements applications

You are to simulate the process by making a record of the balls chosen, in the sequence in which they are chosen. Typical output for a run would be:

Recursion vs. Iteration Eliminating Recursion

Persistent Binary Search Trees

Testing, Debugging, and Verification

Binary Trees and Huffman Encoding Binary Search Trees

Binary Search Trees CMPSC 122

Introduction to Programming System Design. CSCI 455x (4 Units)

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Ordered Lists and Binary Trees

Chapter 14 The Binary Search Tree

Zabin Visram Room CS115 CS126 Searching. Binary Search

A Note on Maximum Independent Sets in Rectangle Intersection Graphs

Data Structures. Level 6 C Module Descriptor

1 Abstract Data Types Information Hiding

Data Structure with C

Lecture Notes on Linear Search

Big O and Limits Abstract Data Types Data Structure Grand Tour.

Shortest Path Algorithms

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

CSE 326: Data Structures B-Trees and B+ Trees

Module 2 Stacks and Queues: Abstract Data Types

Section IV.1: Recursive Algorithms and Recursion Trees

Binary Heaps. CSE 373 Data Structures

Course: Programming II - Abstract Data Types. The ADT Stack. A stack. The ADT Stack and Recursion Slide Number 1

Binary search algorithm

Data Structures. Jaehyun Park. CS 97SI Stanford University. June 29, 2015

Motivation Suppose we have a database of people We want to gure out who is related to whom Initially, we only have a list of people, and information a

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015

Data Structures and Algorithms

COMPUTER SCIENCE (5651) Test at a Glance

Sample Questions Csci 1112 A. Bellaachia

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case.

DATA STRUCTURES USING C

2) Write in detail the issues in the design of code generator.

IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS

In-Memory Database: Query Optimisation. S S Kausik ( ) Aamod Kore ( ) Mehul Goyal ( ) Nisheeth Lahoti ( )

Computer Science 210: Data Structures. Searching

Data Structure [Question Bank]

Transcription:

Merge Sort Biostatistics 615/815 Lecture 8

Notes on Problem Set 2 Union Find algorithms Dynamic Programming Results were very ypositive! You should be gradually becoming g y g comfortable compiling, debugging and executing C code

Question 1 How many random pairs of connections are required to connect 1,000 objects? Answer: ~3,740 Useful notes: Number of non-redundant links to controls loop Repeat simulation to get a better estimates

Question 2 Path lengths in the saturated tree ~1.8 nodes on average ~5 nodes for maximum path Random data is far from worst case Worst case would be paths of log 2 N (10) nodes Path lengths can be calculated using weights[]

Question 3 Using top-down dynamic programming, evaluate the beta-binomial distribution Like other recursive functions, this one can be very costly to evaluate for non-trivial cases Did you contrast results with nondynamic programming solution?

Last Lecture: Quick Sort Choose a partitioning element Organize array such that: All elements to the right are greater All elements to the left are smaller Sort right and left sub-arrays independently

Quick Sort Summary Divide and Conquer Algorithm Recursive calls can be hidden Optimizations Choice of median Threshold for brute-force methods Limiting depth of recursion Do you think quick sort is a stable sort?

C Code: QuickSort void quicksort(item a[], int start, int stop) { int i; if (stop <= start) return; i = partition(a, start, stop); quicksort(a, start, i - 1); quicksort(a, i + 1, stop); }

C Code: Partitioning int partition(item a[], int start, int stop) { int up = start, down = stop 1, part = a[stop]; if (stop <= start) return start; while (true) { while (isless(a[up], part)) up++; while (isless(part, a[down]) && (up < down)) down--; if (up >= down) break; Exchange(a[up], a[down]); up++; down--; } Exchange(a[up], a[stop]); return up; }

Non-Recursive Quick Sort void quicksort(item a[], int start, int stop) { int i, s = 0, stack[64]; stack[s++] = start; stack[s++] = stop; while (s > 0) { stop = stack[--s]; start = stack[--s]; if (start >= stop) continue; } i = partition(a, start, stop); if (i start > stop i) { stack[s++] = start; stack[s++] = i - 1; stack[s++] = i + 1; stack[s++] = stop; } else { stack[s++] = i + 1; stack[s++] = stop; stack[s++] = start; stack[s++] = i - 1; } }

Selection Problem of finding the k th smallest value in an array Simple solution would involve sorting the array Time proportional to Nl log N with Quick Sort Possible to improve by taking into account that Possible to improve by taking into account that only one element must fall into place Time proportional to N

C Code: Selection // Places k th smallest element in the k th position // within array. Could move other elements. void select(item * a, int start, int stop, int k) { int i; if (start <= stop) return; i = partition(a, start, stop); if (i > k) select(a, start, i - 1); if (i < k) select(a, i + 1, stop); }

Merge Sort Divide-And-Conquer Algorithm Divides a file in two halves Merges sorted halves The opposite of quick sort Requires additional storage

C Code: Merge Sort void mergesort(item a[], int start, int stop) { int m = (start + stop)/2; ; if (stop <= start) return; mergesort(a, start, m); mergesort(a, m + 1, stop); merge(a, start, m, stop); }

Merge Pattern N = 21

Merging Sorted Arrays Consider two arrays Assume they are both in order Can you think of a merging strategy?

Merging Two Sorted Arrays void merge_arrays(item merged[], Item a[], int N, Item b[], int M) { int i = 0, j = 0, k; for (k = 0; k < M + N; k++) { if (i == N) { merged[k] = b[j++]; continue; } if (j == M) { merged[k] = a[i++]; continue; } } if (isless(b[j],(b[j] a[i])) { merged[k] = b[j++]; } else { merged[k] = a[i++]; } }

In-Place Merge For sorting, we would like to: Starting with sorted halves a[start m] a[m + 1 end] Generate a sorted stretch a[start end] We would like an in-place merge, but A true in-place merge is quite complicated

Abstract In-Place Merge For caller,,performs like in-place merge Creates copies two sub-arrays Replaces contents with merged results

C Code: Abstract In-place Merge (First Attempt) void merge(item a[], int start, int m, int stop) { static Item extra1[max_n]; static Item extra2[max_n]; for (int i = start; i <= m; i++) extra1[i - start] = a[i]; for (int i = m + 1; i <= stop; i++) extra2[i m - 1] = a[i]; merge_arrays(a + start, extra1, m start + 1, extra2, stop m); }

C Code: Abstract In-place Merge (Second Attempt) void merge(item a[], int start, int m, int stop) { static Item extra[max_n]; for (int i = start; i <= stop; i++) extra[i] = a[i]; for (int i = start, k = start, j = m + 1; k <= stop; k++) for (int i start, k start, j m + 1; k < stop; k++) if (j<=stop && isless(extra[j], extra[i]) i>m) a[k] = extra[j++]; else a[k] = extra[i++]; }

Avoiding End-of-Input Check First Array Second Array a[min] a[max] b[max] b[min] i j At each point, compare elements i and j. Then select the smallest element. Move i or j towards the middle, as appropriate.

C Code: Abstract In-place Merge (Third Attempt!) void merge(item a[], int start, int m, int stop) { int i, j, k; for (int i = start; i <= m; i++) extra[i] = a[i]; for (int j = m + 1; j <= stop; j++) extra[m + 1 + stop j] = a[j]; for (int i = start, k = start, j = stop; k <= stop; k++) if (isless(extra[j], extra[i])) a[k] = extra[j--]; else a[k] = extra[i++]; }

Merge Sort in Action

Merge Sort Notes Order N log N Number of comparisons independent of data Exactly log N rounds Each requires N comparisons Merge sort is stable Insertion sort for small arrays is helpful

Sedgewick s Timings (secs) N QuickSort MergeSort MergeSort* 100,000 24 53 43 200,000000 52 111 92 400,000 109 237 198 800,000 241 524 426 Array of floating point numbers; * using insertion for small arrays

Non-Recursive Merge Sort First sort all sub-arrays of 1 element Perform successive merges Merge results into sub-arrays of 2 elements g y Merge results into sub-arrays of 4 elements

Bottom-Up Merge Sort Item min(item a, Item b) { return isless(a,b)? a : b; } void mergesort(item a[], int start, int stop) { int i, m; for (m = 1; m < stop start; m += m) for (i = start; i < stop - m; i += m + m) { int from = i; int mid = i + m 1; int to = min(i ( + m + m 1,, stop); } merge(a, from, mid, to); }

Merging Pattern for N = 21

Sedgewick s Timings (secs) N QuickSort Top-Down MergeSort Bottom-Up MergeSort 100,000 24 53 59 200,000 52 111 127 400,000000 109 237 267 800,000 241 524 568 Array of floating point numbers

Automatic Memory Allocation Defining large static arrays is not efficient Often, program will run on smaller datasets and the arrays will just waste memory A better way is to allocate and free memory as needed Create a wrapper function that takes care of memory allocation and freeing

Merge Sort, With Automatic Memory Allocation Item * extra; void sort(item a[], int start, int stop) { // Nothing to do with less than one element if (stop <= start) return; // Allocate the required extra storage extra = malloc(sizeof(item) ( i ) * (stop start + 1)); // Merge and sort the data mergesort(a, start, stop); // Free memory once we are done with it free(extra); }

Today Contrasting approaches to divide id and conquer Quick Sort Merge Sort Abstraction in functions Some functions look simple for caller but are more complex under-the-hood Unraveled Recursive Sorts

Sorting Summary Simple O(N 2 ) sorts for very small datasets Insertion, Selection and Bubblesort Improved, but more complex sort Shell sort Very efficient N log N sorts Quick Sort (requires no additional storage) Merge Sort (requires a bit of additional memory)

Sorting Indexes Generating an index is an alternative to sorting the raw data Allows us to keep track of many different orders Can be faster when items are large How it works: Leaves the array containing i the data unchanged Generates an array where position i records position of the i th smallest item in the original data

Example: Indexing with Insertion Sort void makeindex(int index[], Item a[], int start, int stop) { for (int i = start; i <= stop; i++) index[i] [ ] = i; for (int i = start + 1; i <= stop; i++) for (int j = i; j > start; j--) if (isless(a[index[j]], L ( d [j]] a[index[j-1]])) [j Exchange(index[j-1], index[j]) else break; }

Next Lecture: An Alternative to Sorting We ll see how to organize data so that it can be searched And so the complexity of searching and organizing i the data is less than Nl log N Cost: Doing this will require additional memory

Recommended Reading For QuickSort Sedgewick, Chapter 7 Hoare (1962) Computer Journal 5:10-15. For MergeSort Sedgewick, Chapter 8