Learning Outcomes. COMP202 Complexity of Algorithms. Binary Search Trees and Other Search Trees



Similar documents
Chapter 14 The Binary Search Tree

Binary Search Trees 3/20/14

A binary search tree is a binary tree with a special property called the BST-property, which is given as follows:

How To Create A Tree From A Tree In Runtime (For A Tree)

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

Analysis of Algorithms I: Binary Search Trees

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

Converting a Number from Decimal to Binary

TREE BASIC TERMINOLOGIES

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

Lecture 6: Binary Search Trees CSCI Algorithms I. Andrew Rosenberg

Binary Trees and Huffman Encoding Binary Search Trees

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

Ordered Lists and Binary Trees

Binary Search Trees CMPSC 122

Binary Search Trees (BST)

Binary Heap Algorithms

Binary Heaps. CSE 373 Data Structures

root node level: internal node edge leaf node Data Structures & Algorithms McQuain

A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property

Sequential Data Structures

Data Structures and Algorithms

Algorithms Chapter 12 Binary Search Trees

An Evaluation of Self-adjusting Binary Search Tree Techniques

6 March Array Implementation of Binary Trees

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

From Last Time: Remove (Delete) Operation

Data Structure [Question Bank]

CSE 326: Data Structures B-Trees and B+ Trees

Symbol Tables. Introduction

Questions 1 through 25 are worth 2 points each. Choose one best answer for each.

PES Institute of Technology-BSC QUESTION BANK

Abstract Data Type. EECS 281: Data Structures and Algorithms. The Foundation: Data Structures and Abstract Data Types

The following themes form the major topics of this chapter: The terms and concepts related to trees (Section 5.2).

Analysis of Binary Search algorithm and Selection Sort algorithm

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

CS711008Z Algorithm Design and Analysis

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

Data Structure with C

Algorithms and Data Structures

Persistent Binary Search Trees

Quiz 4 Solutions EECS 211: FUNDAMENTALS OF COMPUTER PROGRAMMING II. 1 Q u i z 4 S o l u t i o n s

Exam study sheet for CS2711. List of topics

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

Analysis of a Search Algorithm

Data Structures Fibonacci Heaps, Amortized Analysis

Heaps & Priority Queues in the C++ STL 2-3 Trees

Binary Search Trees. Ric Glassey

Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit

OPTIMAL BINARY SEARCH TREES

- Easy to insert & delete in O(1) time - Don t need to estimate total memory needed. - Hard to search in less than O(n) time

Physical Data Organization

DATA STRUCTURES USING C

The ADT Binary Search Tree

Binary Search Trees. Each child can be identied as either a left or right. parent. right. A binary tree can be implemented where each node

Output: struct treenode{ int data; struct treenode *left, *right; } struct treenode *tree_ptr;

Persistent Data Structures and Planar Point Location

Cpt S 223. School of EECS, WSU

S. Muthusundari. Research Scholar, Dept of CSE, Sathyabama University Chennai, India Dr. R. M.

ER E P M A S S I CONSTRUCTING A BINARY TREE EFFICIENTLYFROM ITS TRAVERSALS DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A

schema binary search tree schema binary search trees data structures and algorithms lecture 7 AVL-trees material

Analysis of Algorithms I: Optimal Binary Search Trees

Fundamental Algorithms

Rotation Operation for Binary Search Trees Idea:

External Memory Geometric Data Structures

Operations: search;; min;; max;; predecessor;; successor. Time O(h) with h height of the tree (more on later).

Lecture 4: Balanced Binary Search Trees

Lecture Notes on Binary Search Trees

Introduction Advantages and Disadvantages Algorithm TIME COMPLEXITY. Splay Tree. Cheruku Ravi Teja. November 14, 2011

Why Use Binary Trees?

Optimal Binary Search Trees Meet Object Oriented Programming

Data Structures and Algorithm Analysis (CSC317) Intro/Review of Data Structures Focus on dynamic sets

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Full and Complete Binary Trees

Unordered Linked Lists

Persistent Data Structures

Data Structures, Practice Homework 3, with Solutions (not to be handed in)

A Comparison of Dictionary Implementations

GRAPH THEORY LECTURE 4: TREES

Overview of Storage and Indexing

Any two nodes which are connected by an edge in a graph are called adjacent node.

Binary search algorithm

Lecture 1: Data Storage & Index

DNS LOOKUP SYSTEM DATA STRUCTURES AND ALGORITHMS PROJECT REPORT

B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers

Algorithms and Data Structures Written Exam Proposed SOLUTION

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Analysis of Algorithms, I

10CS35: Data Structures Using C

Data Structures. Jaehyun Park. CS 97SI Stanford University. June 29, 2015

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.

Binary Search Trees. basic implementations randomized BSTs deletion in BSTs

Introduction to Data Structures and Algorithms

Motivation Suppose we have a database of people We want to gure out who is related to whom Initially, we only have a list of people, and information a

Loop Invariants and Binary Search

Transcription:

Learning Outcomes COMP202 Complexity of Algorithms Binary Search Trees and Other Search Trees [See relevant sections in chapters 2 and 3 in Goodrich and Tamassia.] At the conclusion of this set of lecture notes, you should:. Recall the "binary search" method on sorted arrays. 2. Comprehend how binary search trees are used to mimic the binary search method for arrays. 3. Understand the concept of AVL trees and the mechanism used to maintain binary search trees that satisfy the "height balance" property. 4. Understand the methods for using (2, 4)-trees to store and maintain sorted data. Data Structures Data Structures (cont.) Algorithmic computations require data (information) upon which to operate. The speed of algorithmic computations is (partially) dependent on efficient use of this data. Data structures are specific methods for storing and accessing information. We study different kinds of data structures as one kind may be more appropriate than another depending upon the type of data stored, and how we need to use it for a particular algorithm. Many modern high-level programming languages, such as C++, Java, and Perl, have implemented various data structures in some format. For specialized data structures that aren t implemented in these programming languages, many can typically be constructed using the more general features of the programming language (e.g. pointers in C++, references in Perl, etc.).

Data Structures: Arrays Arrays are available in some manner in every high-level programming language. You have encountered this data structure in some programming course(s) before. Abstractly, an array is simply a collection of items, each of which has its own unique index used to access this data. So if A is an array having n elements, we (typically) access its members as A[0], A[],..., A[n ]. Data Structures: Arrays (cont.) One possible difficulty using arrays in real-life programs is that we must (usually) specify the size of the array as it is created. If we later need to store more items than the array can currently hold, then we (usually) have to create a new array and copy the old array elements into the newly created array. So we have quick access to individual array members via indexing, but we need to know how many elements we want to store beforehand. Another possible disadvantage is that it is difficult to insert new items into the middle of an array. To do so, we have to copy (shift) items to make space for the new item in the desired position. So it can be difficult (i.e. time-consuming) to maintain data that is sorted by using an array. Data Structures: Linked Lists Linked List You have previously seen linked lists in COMP 08. In contrast to arrays, data can be easily inserted anywhere in a linked list by inserting a new node into the list and reassigning pointers. A list abstract data type (ADT) supports: referring, update (both insert and delete) as well as searching methods. We can implement the list ADT as either a singly-, or doublylinked list. A node in a singly-linked list stores a next link pointing to next element in list (null if element is last element). 2 3 A node in a doubly-linked list stores two links: a next link, pointing to the next element in list, and a prev link, pointing to the previous element in the list. 2 3 From now on, we will concentrate on doubly-linked lists.

List ADT: Update methods List update: Element insertion Pseudo-code for insertafter(p,e) : A list ADT supports the following update methods: replaceelement(p,e): p - position, e - element. swapelements(p,q): p, q - positions. insertfirst(e): e - element. insertlast(e): e - element. insertbefore(p,e): p - position, e - element. insertafter(p,e): p - position, e - element. remove(p): p - position. INSERTAFTER(p, e) Create a new node v 2 v.element e 3 Link v to its predecessor 4 v.prev p 5 Link v to its successor 6 v.next p.next 7 Link p s old successor to v 8 (p.next).prev v 9 Link p to its new successor v 0 p.next v return v List update: Element insertion List update: Element removal Example: insertafter(,2) Example: remove(2) USA Greece Spain 3 4 3 4 USA Greece Spain 2 2 3 4 USA Spain

List update: Complexity Lists: Implementation in Java What is the cost of the insertion and removal methods? If the address of element p is known, then the cost is O(). If only the address of the head of the list is known, then the cost of an update is O(p) (we need to traverse the list from positions 0,..., p to first find p). Java implements a List class (and related classes that vary in functionality) so you need not create your own. Method names could differ from the ones given here for our List ADT, and users should check carefully how Java defines its supporting methods to avoid unexpected behavior when using Java List classes. Ordered Data Ordered Dictionary We often wish to store data that is ordered in some fashion (typically in numeric order, or alphabetical order, but there may be other ways of ordering it). Arrays and lists are obvious structures that may be used to store this type of data. Based on our previous discussions, arrays aren t generally efficient to maintain data where we must add/delete items and maintain a sorted order. Linked lists may provide a better method in that case, but it is generally harder to search for an item in a list. A dictionary is a set of elements and keys, supported by dictionary operations: findelement(k): position k insertelement(k,e): position k, element e removeelement(k): position k An ordered dictionary maintains an order relation for its elements, where the items are ordered by their keys. There are times when the keys and elements are the same, i.e. k = e for each pair.

Sorted Tables Binary Search Assumptions: If a dictionary D, is ordered, we can store its items in a vector or array, S, by non-decreasing order of keys. (This generally assumes that we re not going to add more items into the dictionary.) Storing the dictionary in an array in this fashion allows faster searching than if S is stored using a linked list. An ordered array implementation is typically referred to as a lookup table. S is an array-based representation of our data having n elements. Accessing an element of S by its index takes O() time. The item at index i has a key no smaller than keys of the items of ranks,..., i and no larger than keys of the items of ranks i +, i + 2,..., n. key(i) denotes the key at index i. elem(i) denotes the element at index i. Keys are unique. Binary Search - Algorithm Here s a recursive search algorithm. BINARYSEARCH(S, k, low, high) Input is an ordered array of elements. Output: Element with key k if it exists, otherwise an error. if low > high 2 then return NO_SUCH_KEY 3 else 4 mid (low + high)/2 5 if k = key(mid) 6 then return elem(mid) 7 elseif k < key(mid) 8 then return BINARYSEARCH(S, k, low, mid-) 9 else return BINARYSEARCH(S, k, mid +, high) Binary Search - Example BINARYSEARCH(S, 9,, size(s)) 2 3 5 7 8 9 2 4 7 9 22 25 27 28 33 37 low = mid = 8 high = 6 2 3 5 7 8 9 2 4 7 9 22 25 27 28 33 37 low = 9 mid = 2 high = 6 2 3 5 7 8 9 2 4 7 9 22 25 27 28 33 37 low mid high

Complexity of Binary Search Complexity of Binary Search (cont.) Let the function T (n) denote the running time of the binary search method. We can characterize the running time of the recursive binary search algorithm as follows { T (n) = b if n < 2 T (n/2) + b otherwise where b is a constant that denotes the time for updating low and high and other overhead. Comparison of linked list vs. a lookup table (sorted array). Method Linked list Lookup table findelement O(n) O(log n) insertitem (having located its position) O() O(n) removeelement O(n) O(n) closestkeybefore O(n) O(log n) As you did in COMP08, it can be shown that binary search runs in time O(log n) on a list with n keys. Can we find something else? Another option We see some of the contrast between linked lists and arrays. List are somewhat costly (in terms of time) to find an item, but it is easy (i.e. quick) to add an item once we know where to insert it. Arrays are efficient when we search for items, but it is expensive to update the list and maintain items in a sorted order. Can we find a data structure that might let us mimic the efficiency of binary search on arrays (O(log n)), and lets us insert/delete items more efficiently than arrays or lists (possibly as big as O(n)) to maintain an ordered collection? One data structure that seems to have this possibility is a binary search tree. We pause first to review rooted trees before we get to binary search trees. You have previously encountered binary search trees in COMP 08.

Data Structures: Rooted Trees A rooted tree, T, is a set of nodes which store elements in a parent-child relationship. T has a special node, r, called the root of T. Each node of T (excluding the root node r) has a parent node. BILLY TOM BOB PETE JOHN ALEX Rooted trees: terminology If node u is the parent of node v, then v is a child of u. Two nodes that are children of the same parent are called siblings. A node is a leaf (external) if it has no children and internal otherwise. A tree is ordered if there is a linear ordering defined for the children of each internal node (i.e. an internal node has a distinguished first child, second child, etc). PAUL STEVE NATHAN Binary Trees Depth of a node in a tree A binary tree is a rooted ordered tree in which every node has at most two children. A binary tree is proper if each internal node has exactly two children. Each child in a binary tree is labeled as either a left child or a right child. The depth of a node, v, is number of ancestors of v, excluding v itself. This is easily computed by a recursive function. DEPTH(T, v) if T.isRoot(v) 2 then return 0 3 else return + DEPTH(T, T.parent(v))

The height of a tree Traversal of trees The height of a tree is equal to the maximum depth of an external node in it. The following pseudo-code computes the height of the subtree rooted at v. HEIGHT(T, v) if ISEXTERNAL(v) 2 then return 0 3 else 4 h = 0 5 for each w T.CHILDREN(v) 6 do 7 h = MAX(h, HEIGHT(T, w)) 8 return + h As a brief reminder, there are essentially three ways that trees are generally explored/traversed. These methods are the. Pre-order traversal 2. Post-order traversal 3. In-order traversal I will not give formal pseudo-code for these methods, and refer you to the COMP08 notes for a more full description of these methods. Postorder traversal of trees Data structures for trees A B G Linked structure: each node v of T is represented by an object with references to the element stored at v and positions of its parents and children. C F H K 6 D E I J A post-order traversal of this tree (printing out the vertex label after recursively visiting the left and right children) would produce: D,E,C,F,B,I,J,H,K,G,A.

Data structures for rooted t-ary trees Data structures for rooted t-ary trees (cont.) For rooted trees where each node has at most t children, and is of bounded depth, you can store the tree in an array A. This is most useful when you re working with (almost) complete t-ary trees. Consider, for example, a binary tree. The root is stored in A[0]. The (possibly) two children of the root are stored in A[] and A[2]. The two children of A[] are stored in A[3] and A[4], and the two children of A[2] are in A[5] and A[6], and so forth. In general, the two children of node A[i] are in A[2 i + ] and A[2 i + 2]. The parent of node A[i] (except the root) is the node in position A[ (i )/2 ]. This can be generalized to the case when each node has at most t children, and we will see this implementation for rooted trees later when we discuss with heaps. Binary Search Tree Binary Search Tree (BST) A Binary Search Tree (BST) applies the motivation of binary search to a tree-based data structure. In a BST each internal node stores an element, e (or, more generally, a key k which defines the ordering, and some element e). An inorder traversal of a BST visits the nodes in non-decreasing order. 40 63 65 A BST has the property that, given a node v storing the element e, all elements in the left subtree at v are less than or equal to e, and those in the right subtree at v are greater than or equal to e. 7 73

Searching in a BST Complexity of searching in a BST Here s a recursive searching method: height time per level O() TREESEARCH(k, v) Input: A key k and a node v of a binary search tree. Output: A node w in the subtree T (v), either w is an internal node with key k or w is an external node where the key k would belong if it existed. if ISEXTERNAL(v) 2 then return v 3 elseif k < key(v) 4 then return TREESEARCH(k, T.LEFTCHILD(v)) 5 else return TREESEARCH(k, T.RIGHTCHILD(v)) h tree T O() O() total time: O(h) Insertion in a BST Deletion in a BST 97 97 97 97 w 38 84 50 84 50 84 z 84 65 65 65 65 79 79 79 79 78 Insert here 78 78 Insertion of 78 is performed in time O(h). Removal of a node (38) with one external (non-empty) child can be performed in time O(h).

Inefficiency of general BSTs All operations in a BST are performed in O(h), where h is the height of the tree. Unfortunately, h may be as large as n, e.g. for a degenerate tree. The main advantage of binary searching (i.e. the O(log n) search time) may be lost if the BST is not balanced. In the worst case, the search time may be O(n). AVL trees Height-Balance Property: for every internal node, v, of T, the heights of the children of v can differ by at most. 2 7 32 44 2 4 78 3 88 50 Can we do better? 48 62 AVL trees (cont.) Insertion in AVL trees An AVL tree is a tree that has the Height-Balance Property. Theorem: The height of an AVL tree, T, storing n items is O(log n). Consequence : A search in an AVL tree can be performed in time O(log n). Consequence 2: Insertions and removals in AVL need more careful implementations (using rotations to maintain the height-balance property). An insertion in an AVL tree begins as an insertion in a general BST, i.e., attaching a new external node (leaf) to the tree. This action may result in a tree that violates the height-balance property because the heights of some nodes increase by. A bottom-up mechanism (based on rotations) is applied to rebalance the unbalanced subtrees.

AVL: Rebalancing after insertion (a) Single rotations (zig) in AVL trees (a) Single "left" rotation Single "left" rotation T T T T T T T 2 3 0 2 3 T T T T T T T 2 3 0 2 3 (b) Single "right" rotation (b) T 3 Single "right" rotation T 2 T 3 T T T 2 T 3 T 2 Inserting the number 54 in the previous example will require a rebalancing action (a right rotation). T T T 2 T 3 Double rotations (zig-zag) in AVL trees Deletion in AVL trees (a) Double "right left" rotation T 3 A deletion in an AVL tree begins as a removal in a general BST. T T 2 T T T 2 3 Action may violate the height-balance property. (b) Bottom-up mechanism (based on rotations) is applied to rebalance the tree. Doublee "left right" rotation T 3 T T 2 T T 2 T 3

Rebalancing after a deletion in AVL tree (Stage ) Rebalancing after a deletion in AVL tree (Stage 2) 62 44 7 62 44 78 50 78 7 50 82 48 54 82 48 54 32 Remove 32 from AVL tree Rebalancing the tree with a left rotation AVL performance AVL trees (a demonstration) All operations (search, insertion, and removal) on an AVL tree with n elements can be performed in O(log n) time. height time per level O() Visit the website AVL tree T O() http://webdiis.unizar.es/asignaturas/eda/avltree/avltree.html for a Java applet in which you can view insertion and deletion operations on AVL trees. O() You can control the animation speed, and insert and delete items (either random elements or ones that you specify). O (log n ) down phase up phase

(2, 4) trees (2, 4) trees (cont.) Every node in a (2, 4) tree has at least 2 and at most 4 children. Each internal node v in a (2, 4) tree contains,, 2 or 3 keys defining the range of keys stored in its subtrees. 2 All external nodes (leaves) in a (2, 4) tree have the same depth. Theorem: The height of a (2, 4) tree storing n items is Θ(log n). 5 0 5 3 4 6 7 8 3 4 7 (2, 4) trees - Search (2, 4) tree - Insertion Search for a key k in a (2, 4) tree T is done via tracing the path in T starting at the root in a top-down manner. Visiting a node v we compare k with keys (k i ) stored at v: If k = k i the search is completed. if ki k k i+, the i + th subtree of v is searched recursively. An insertion of k into a (2, 4) tree T begins with a search for an internal node on the lowest level that could accommodate k without violating the range. This action may overflow the node-size of a node v acquiring the new key k. Bottom-up mechanism (based on split-operation) is applied to fix overflows.

(2, 4) trees - Split operation Here s an example of a split operation. Sequence of insertions (Pt. ) h h 2 w w 3 4 4 6 4 6 2 4 6 2 5 k k 2 k 3 k 4 v v 2 v 3 v 4 v 5 h k 3 h 2 2 2 w w v v 3 k k k 2 4 4 6 5 4 6 5 v v v v 2 3 4 v 5 Sequence of insertions (Pt. 2) (2, 4) trees - Deletion 2 2 3 4 6 5 3 4 5 6 5 5 2 2 Deletion of key k from a (2, 4) tree T begins with a search for node v possessing key k i = k. Key k i is replaced by largest key in the i th consecutive subtree of v. Bottom-up mechanism (based on transfer and fusion operation) fixes any underflows. 3 4 5 6 5 3 4 6 5

(2, 4) tree performance The height of a (2, 4) tree storing n elements in O(log n). Split, transfer and fusion operations take O() time. Search, insertion and removal of an elements in a tree visits O(log n) nodes.