Binary Trees and Huffman Encoding Binary Search Trees



Similar documents
root node level: internal node edge leaf node Data Structures & Algorithms McQuain

Questions 1 through 25 are worth 2 points each. Choose one best answer for each.

Binary Heap Algorithms

Analysis of Algorithms I: Binary Search Trees

Converting a Number from Decimal to Binary

Binary Search Trees (BST)

Learning Outcomes. COMP202 Complexity of Algorithms. Binary Search Trees and Other Search Trees

The following themes form the major topics of this chapter: The terms and concepts related to trees (Section 5.2).

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

How To Create A Tree From A Tree In Runtime (For A Tree)

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

Ordered Lists and Binary Trees

DATA STRUCTURES USING C

Class Notes CS Creating and Using a Huffman Code. Ref: Weiss, page 433

Data Structure [Question Bank]

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

Data Structure with C

A binary search tree is a binary tree with a special property called the BST-property, which is given as follows:

Chapter 14 The Binary Search Tree

Operations: search;; min;; max;; predecessor;; successor. Time O(h) with h height of the tree (more on later).

Binary Search Trees CMPSC 122

Data Structures and Algorithms

TREE BASIC TERMINOLOGIES

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

Binary Search Trees 3/20/14

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

Algorithms and Data Structures

Hash Tables. Computer Science E-119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Data Dictionary Revisited

Heaps & Priority Queues in the C++ STL 2-3 Trees

Chapter 8: Binary Trees

Lecture 6: Binary Search Trees CSCI Algorithms I. Andrew Rosenberg

DNS LOOKUP SYSTEM DATA STRUCTURES AND ALGORITHMS PROJECT REPORT

Information, Entropy, and Coding

Algorithms Chapter 12 Binary Search Trees

10CS35: Data Structures Using C

Output: struct treenode{ int data; struct treenode *left, *right; } struct treenode *tree_ptr;

From Last Time: Remove (Delete) Operation

Analysis of Algorithms I: Optimal Binary Search Trees

Parallelization: Binary Tree Traversal

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

Binary Heaps. CSE 373 Data Structures

Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit

ER E P M A S S I CONSTRUCTING A BINARY TREE EFFICIENTLYFROM ITS TRAVERSALS DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A

Data Structures. Level 6 C Module Descriptor

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D.

PES Institute of Technology-BSC QUESTION BANK

Big Data and Scripting. Part 4: Memory Hierarchies

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

Image Compression through DCT and Huffman Coding Technique

Symbol Tables. Introduction

2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8]

Data Structures, Practice Homework 3, with Solutions (not to be handed in)

OPTIMAL BINARY SEARCH TREES

Cpt S 223. School of EECS, WSU

B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers

GRAPH THEORY LECTURE 4: TREES

International Journal of Advanced Research in Computer Science and Software Engineering

A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property

Algorithms and Data Structures Written Exam Proposed SOLUTION

DATABASE DESIGN - 1DL400

THE SECURITY AND PRIVACY ISSUES OF RFID SYSTEM

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Binary search algorithm

AP Computer Science AB Syllabus 1

Binary Search Trees. Ric Glassey

Binary Coded Web Access Pattern Tree in Education Domain

External Memory Geometric Data Structures

CS711008Z Algorithm Design and Analysis

A TOOL FOR DATA STRUCTURE VISUALIZATION AND USER-DEFINED ALGORITHM ANIMATION

Algorithms and Data Structures

Tables so far. set() get() delete() BST Average O(lg n) O(lg n) O(lg n) Worst O(n) O(n) O(n) RB Tree Average O(lg n) O(lg n) O(lg n)

Data Structures. Jaehyun Park. CS 97SI Stanford University. June 29, 2015

MAX = 5 Current = 0 'This will declare an array with 5 elements. Inserting a Value onto the Stack (Push)

Exercises Software Development I. 11 Recursion, Binary (Search) Trees. Towers of Hanoi // Tree Traversal. January 16, 2013

6 March Array Implementation of Binary Trees

6 Creating the Animation

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Rotation Operation for Binary Search Trees Idea:

Binary Trees (1) Outline and Required Reading: Binary Trees ( 6.3) Data Structures for Representing Trees ( 6.4)

Atmiya Infotech Pvt. Ltd. Data Structure. By Ajay Raiyani. Yogidham, Kalawad Road, Rajkot. Ph : ,

Load Balancing. Load Balancing 1 / 24

GUJARAT TECHNOLOGICAL UNIVERSITY, AHMEDABAD, GUJARAT. Course Curriculum. DATA STRUCTURES (Code: )

Arithmetic Coding: Introduction

Exam study sheet for CS2711. List of topics

CSE 326: Data Structures B-Trees and B+ Trees

Lecture 10: Regression Trees

EE602 Algorithms GEOMETRIC INTERSECTION CHAPTER 27

Quiz 4 Solutions EECS 211: FUNDAMENTALS OF COMPUTER PROGRAMMING II. 1 Q u i z 4 S o l u t i o n s

Physical Data Organization

Data Structures Fibonacci Heaps, Amortized Analysis

UIL Computer Science for Dummies by Jake Warren and works from Mr. Fleming

The ADT Binary Search Tree

Any two nodes which are connected by an edge in a graph are called adjacent node.

Binary Search Trees. basic implementations randomized BSTs deletion in BSTs

Sample Questions Csci 1112 A. Bellaachia

schema binary search tree schema binary search trees data structures and algorithms lecture 7 AVL-trees material

Lecture Notes on Binary Search Trees

Transcription:

Binary Trees and Huffman Encoding Binary Search Trees Computer Science E119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Motivation: Maintaining a Sorted Collection of Data A data dictionary is a sorted collection of data with the following key operations: search for an item (and possibly delete it) insert a new item If we use a list to implement a data dictionary, efficiency = O(n). data structure searching for an item inserting an item a list implemented using an array a list implemented using a linked list O(log n) using binary search O(n) using linear search (binary search in a linked list is O(n log n)) In the next few lectures, we ll look at data structures (trees and hash tables) that can be used for a more efficient data dictionary. We ll also look at other applications of trees. O(n) because we need to shift items over O(n) (O(1) to do the actual insertion, but O(n) to find where it belongs)

What Is a Tree? root edge node A tree consists of: a set of nodes a set of edges, each of which connects a pair of nodes Each node may have one or more data items. each data item consists of one or more fields key field = the field used when searching for a data item multiple data items with the same key are referred to as duplicates The node at the top of the tree is called the root of the tree. Relationships Between Nodes 1 2 3 4 6 8 9 10 11 12 If a node N is connected to other nodes that are directly below it in the tree, N is referred to as their parent and they are referred to as its children. example: node is the parent of nodes 10, 11, and 12 Each node is the child of at most one parent. Other familyrelated terms are also used: nodes with the same parent are siblings a node s ancestors are its parent, its parent s parent, etc. example: node 9 s ancestors are 3 and 1 a node s descendants are its children, their children, etc. example: node 1 s descendants are all of the other nodes

Types of Nodes 1 2 3 4 6 8 9 10 11 12 13 A leaf node is a node without children. An interior node is a node with one or more children. A Tree is a Recursive Data Structure 1 2 3 4 6 8 9 10 11 12 13 Each node in the tree is the root of a smaller tree! refer to such trees as subtrees to distinguish them from the tree as a whole example: node 2 is the root of the subtree circled above example: node 6 is the root of a subtree with only one node We ll see that tree algorithms often lend themselves to recursive implementations.

Path, Depth, Level, and Height level 0 level 1 depth = 2 level 2 There is exactly one path (one sequence of edges) connecting each node to the root. depth of a node = # of edges on the path from it to the root Nodes with the same depth form a level of the tree. The height of a tree is the maximum depth of its nodes. example: the tree above has a height of 2 Binary Trees In a binary tree, nodes have at most two children. Recursive definition: a binary tree is either: 1) empty, or 2) a node (the root of the tree) that has one or more data items a left child, which is itself the root of a binary tree a right child, which is itself the root of a binary tree Example: s left child s right child 4 18 38 s left subtree s right subtree How are the edges of the tree represented?

Representing a Binary Tree Using Linked Nodes public class LinkedTree { private class Node { private int key; private LLList data; private Node left; private Node right; private Node root; 4 18 38 root // list of data items // reference to left child // reference to right child 4 null 12 null null 18 32 null null null 38 null null see ~cscie119/examples/trees/linkedtree.java Traversing a Binary Tree Traversing a tree involves visiting all of the nodes in the tree. visiting a node = processing its data in some way example: print the key We will look at four types of traversals. Each of them visits the nodes in a different order. To understand traversals, it helps to remember the recursive definition of a binary tree, in which every node is the root of a subtree. 12 is the root of s left subtree 4 is the root of 12 s left subtree 4 18 38 32 is the root of s right subtree

Preorder Traversal preorder traversal of the tree whose root is N: 1) visit the root, N 2) recursively perform a preorder traversal of N s left subtree 3) recursively perform a preorder traversal of N s right subtree 9 2 6 8 4 Preorder traversal of the tree above: 2 4 6 9 8 Which statespace search strategy visits nodes in this order? Implementing Preorder Traversal public class LinkedTree { private Node root; public void preorderprint() { if (root!= null) preorderprinttree(root); private static void preorderprinttree(node root) { System.out.print(root.key + ); if (root.left!= null) preorderprinttree(root.left); if (root.right!= null) preorderprinttree(root.right); Not always the same as the root of the entire tree. preorderprinttree() is a static, recursive method that takes as a parameter the root of the tree/subtree that you want to print. preorderprint() is a nonstatic method that makes the initial call. It passes in the root of the entire tree as the parameter.

Tracing Preorder Traversal void preorderprinttree(node root) { System.out.print(root.key + ); if (root.left!= null) preorderprinttree(root.left); if (root.right!= null) preorderprinttree(root.right); 2 6 8 9 4 print print 4 print 4 2 print 2 2 2 6 print 6 time... Postorder Traversal postorder traversal of the tree whose root is N: 1) recursively perform a postorder traversal of N s left subtree 2) recursively perform a postorder traversal of N s right subtree 3) visit the root, N 9 2 6 8 Postorder traversal of the tree above: 4 2 6 8 9 4

Implementing Postorder Traversal public class LinkedTree { private Node root; public void postorderprint() { if (root!= null) postorderprinttree(root); private static void postorderprinttree(node root) { if (root.left!= null) postorderprinttree(root.left); if (root.right!= null) postorderprinttree(root.right); System.out.print(root.key + ); Note that the root is printed after the two recursive calls. Tracing Postorder Traversal void postorderprinttree(node root) { if (root.left!= null) postorderprinttree(root.left); if (root.right!= null) postorderprinttree(root.right); System.out.print(root.key + ); 2 6 8 9 4 4 print 4 2 2 2 6 print 2 print 6... time

Inorder Traversal inorder traversal of the tree whose root is N: 1) recursively perform an inorder traversal of N s left subtree 2) visit the root, N 3) recursively perform an inorder traversal of N s right subtree 9 2 6 8 Inorder traversal of the tree above: 2 4 6 8 9 4 Implementing Inorder Traversal public class LinkedTree { private Node root; public void inorderprint() { if (root!= null) inorderprinttree(root); private static void inorderprinttree(node root) { if (root.left!= null) inorderprinttree(root.left); System.out.print(root.key + ); if (root.right!= null) inorderprinttree(root.right); Note that the root is printed between the two recursive calls.

Tracing Inorder Traversal void inorderprinttree(node root) { if (root.left!= null) inorderprinttree(root.left); System.out.print(root.key + ); if (root.right!= null) inorderprinttree(root.right); 2 6 8 9 4 4 print 4 2 2 2 6 print 2 print 6 print... time LevelOrder Traversal Visit the nodes one level at a time, from top to bottom and left to right. 9 2 6 8 4 Levelorder traversal of the tree above: 9 2 6 8 4 Which statespace search strategy visits nodes in this order? How could we implement this type of traversal?

TreeTraversal Summary preorder: root, left subtree, right subtree postorder: left subtree, right subtree, root inorder: left subtree, root, right subtree levelorder: top to bottom, left to right Perform each type of traversal on the tree below: 9 13 3 8 10 18 2 6 1 Using a Binary Tree for an Algebraic Expression We ll restrict ourselves to fully parenthesized expressions and to the following binary operators: +,, *, / Example expression: ((a + (b * c)) (d / e)) Tree representation: + / a * d e b c Leaf nodes are variables or constants; interior nodes are operators. Because the operators are binary, either a node has two children or it has none.

Traversing an AlgebraicExpression Tree Inorder gives conventional algebraic notation. print ( before the recursive call on the left subtree print ) after the recursive call on the right subtree + a * b c d / e for tree at right: ((a + (b * c)) (d / e)) Preorder gives functional notation. print ( s and ) s as for inorder, and commas after the recursive call on the left subtree for tree above: subtr(add(a, mult(b, c)), divide(d, e)) Postorder gives the order in which the computation must be carried out on a stack/rpn calculator. for tree above: push a, push b, push c, multiply, add, see ~cscie119/examples/trees/exprtree.java FixedLength Character Encodings A character encoding maps each character to a number. Computers usually use fixedlength character encodings. ASCII (American Standard Code for Information Interchange) uses 8 bits per character. char dec binary a 9 01100001 b 98 01100010 c 99 01100011 example: bat is stored in a text file as the following sequence of bits: 01100010 01100001 01110100 Unicode uses 16 bits per character to accommodate foreignlanguage characters. (ASCII codes are a subset.) Fixedlength encodings are simple, because all character encodings have the same length a given character always has the same encoding

VariableLength Character Encodings Problem: fixedlength encodings waste space. Solution: use a variablelength encoding. use encodings of different lengths for different characters assign shorter encodings to frequently occurring characters Example: e 01 test would be encoded as 00 01 111 00 000111100 o 100 s 111 t 00 Challenge: when decoding/decompressing an encoded document, how do we determine the boundaries between characters? example: for the above encoding, how do we know whether the next character is 2 bits or 3 bits? One requirement: no character s encoding can be the prefix of another character s encoding (e.g., couldn t have 00 and 001). Huffman Encoding Huffman encoding is a type of variablelength encoding that is based on the actual character frequencies in a given document. Huffman encoding uses a binary tree: to determine the encoding of each character to decode an encoded file i.e., to decompress a compressed file, putting it back into ASCII Example of a Huffman tree (for a text with only six chars): Leaf nodes are characters. 0 1 Left branches are labeled with a 0, and right branches 0 1 0 1 are labeled with a 1. t e If you follow a path from root 0 1 0 1 to leaf, you get the encoding o i a s of the character in the leaf example: 101 = i

Building a Huffman Tree 1) Begin by reading through the text to determine the frequencies. 2) Create a list of nodes that contain (character, frequency) pairs for each character that appears in the text. o 21 i 23 a 2 s t 2 e 40 3) Remove and merge the nodes with the two lowest frequencies, forming a new node that is their parent. left child = lowest frequency node right child = the other node frequency of parent = sum of the frequencies of its children in this case, 21 + 23 = 44 o 21 44 i 23 Building a Huffman Tree (cont.) 4) Add the parent to the list of nodes: a 2 s t 2 e 40 44 o 21 i 23 ) Repeat steps 3 and 4 until there is only a single node in the list, which will be the root of the Huffman tree.

Completing the Huffman Tree Example I Merge the two remaining nodes with the lowest frequencies: a s t e 2 2 40 44 o 21 i 23 t 2 e 40 44 1 o 21 i 23 a 2 s Completing the Huffman Tree Example II Merge the next two nodes: t e 2 40 44 1 o 21 i 23 a 2 s 44 1 6 o 21 i 23 a 2 s t 2 e 40

Completing the Huffman Tree Example III Merge again: 44 1 6 o i a s t e 21 23 2 2 40 6 9 t 2 e 40 44 1 o i a s 21 23 2 Completing the Huffman Tree Example IV The next merge creates the final tree: 162 0 1 t 2 6 e 40 44 9 1 0 t 1 e 0 0 1 0 1 1 o i a s o i a s 21 23 2 Characters that appear more frequently end up higher in the tree, and thus their encodings are shorter.

Using Huffman Encoding to Compress a File 1) Read through the input file and build its Huffman tree. 2) Write a file header for the output file. include an array containing the frequencies so that the tree can be rebuilt when the file is decompressed. 3) Traverse the Huffman tree to create a table containing the encoding of each character: 0 t 1 e 0 1 0 1 0 1 0 1 o i a s a? e? i 101 o 100 s 111 t 00 4) Read through the input file a second time, and write the Huffman code for each character to the output file. Using Huffman Decoding to Decompress a File 1) Read the frequency table from the header and rebuild the tree. 2) Read one bit at a time and traverse the tree, starting from the when you read a bit of 1, go to the right child when you read a bit of 0, go to the left child when you reach a leaf node, record the character, return to the root, and continue reading bits The tree allows us to easily overcome the challenge of determining the character boundaries! 0 t 1 e 0 1 0 1 0 1 0 1 o i a s example: 101111110000111100 101 = right,left,right = i 111 = right,right,right= s 110 = right,right,left = a 00 = left,left = t 01 = left,right = e 111 = right,right,right= s 00 = left,left = t

Binary Search Trees Searchtree property: for each node k: all nodes in k s left subtree are < k all nodes in k s right subtree are >= k Our earlier binarytree example is a search tree: < k k k < 12 4 18 38 < 12 Searching for an Item in a Binary Search Tree Algorithm for searching for an item with a key k: if k == the root node s key, you re done else if k < the root node s key, search the left subtree else search the right subtree Example: search for 4 18 38

Implementing BinaryTree Search public class LinkedTree { // Nodes have keys that are ints private Node root; public LLList search(int key) { Node n = searchtree(root, key); return (n == null? null : n.data); private static Node searchtree(node root, int key) { // write together If we find a node that has the specified key, we return its data field, which holds a list of the data items for that key. Inserting an Item in a Binary Search Tree We want to insert an item whose key is k. We traverse the tree as if we were searching for k. If we find a node with key k, we add the data item to the list of items for that node. If we don t find it, the last node we encounter will be the parent P of the new node. if k < P s key, make the new node P s left child else make the node P s right child Special case: if the tree is empty, make the new node the root of the tree. The resulting tree is still a search tree. example: insert 3 4 18 38 3 P

Implementing BinaryTree Insertion We'll implement part of the insert() method together. We'll use iteration rather than recursion. Our method will use two references/pointers: trav: performs the traversal down to the point of insertion parent: stays one behind trav like the trail reference that we sometimes use when traversing a linked list parent 4 18 38 trav Implementing BinaryTree Insertion public void insert(int key, Object data) { Node parent = null; Node trav = root; while (trav!= null) { if (trav.key == key) { trav.data.additem(data, 0); return; 4 18 38 Node newnode = new Node(key, data); if (parent == null) // the tree was empty root = newnode; else if (key < parent.key) parent.left = newnode; else parent.right = newnode;

Deleting Items from a Binary Search Tree Three cases for deleting a node x Case 1: x has no children. Remove x from the tree by setting its parent s reference to null. ex: delete 4 4 18 38 18 38 Case 2: x has one child. Take the parent s reference to x and make it refer to x s child. ex: delete 12 18 32 18 38 38 Deleting Items from a Binary Search Tree (cont.) Case 3: x has two children we can't just delete x. why? instead, we replace x with a node from elsewhere in the tree to maintain the searchtree property, we must choose the replacement carefully example: what nodes could replace below? 4 18 38 3

Deleting Items from a Binary Search Tree (cont.) Case 3: x has two children (continued): replace x with the smallest node in x s right subtree call it y y will either be a leaf node or will have one right child. why? After copying y s item into x, we delete y using case 1 or 2. ex: delete x 30 x 30 18 4 18 4 18 4 30 y 30 y 3 3 3 Implementing BinaryTree Deletion public LLList delete(int key) { // Find the node and its parent. Node parent = null; Node trav = root; while (trav!= null && trav.key!= key) { parent parent = trav; if (key < trav.key) trav = trav.left; trav else trav = trav.right; 4 18 38 // Delete the node (if any) and return the removed items. if (trav == null) // no such key return null; else { LLList removeddata = trav.data; deletenode(trav, parent); return removeddata; This method uses a helper method to delete the node.

Implementing Case 3 private void deletenode(node todelete, Node parent) { if (todelete.left!= null && todelete.right!= null) { // Find a replacement and // the replacement's parent. Node replaceparent = todelete; // Get the smallest item // in the right subtree. Node replace = todelete.right; // What should go here? 18 4 todelete // Replace todelete's key and data // with those of the replacement item. todelete.key = replace.key; todelete.data = replace.data; // Recursively delete the replacement // item's old node. It has at most one // child, so we don't have to // worry about infinite recursion. deletenode(replace, replaceparent); else {... 30 3 Implementing Cases 1 and 2 private void deletenode(node todelete, Node parent) { if (todelete.left!= null && todelete.right!= null) {... else { Node todeletechild; if (todelete.left!= null) todeletechild = todelete.left; 30 else todeletechild = todelete.right; // Note: in case 1, todeletechild // will have a value of null. 18 4 if (todelete == root) root = todeletechild; else if (todelete.key < parent.key) parent.left = todeletechild; else parent.right = todeletechild; 30 3 parent todelete todeletechild

Efficiency of a Binary Search Tree The three key operations (search, insert, and delete) all have the same time complexity. insert and delete both involve a search followed by a constant number of additional operations Time complexity of searching a binary search tree: best case: O(1) worst case: O(h), where h is the height of the tree average case: O(h) What is the height of a tree containing n items? it depends! why? Balanced Trees A tree is balanced if, for each node, the node s subtrees have the same height or have heights that differ by 1. For a balanced tree with n nodes: height = O(log 2 n). gives a worstcase time complexity that is logarithmic (O(log 2 n)) the best worstcase time complexity for a binary tree 4 30 38

What If the Tree Isn't Balanced? Extreme case: the tree is equivalent to a linked list height = n 1 4 worstcase time complexity = O(n) 12 We ll look next at searchtree variants that take special measures to ensure balance. 32 36 38