CSE 100: HUFFMAN CODES

Similar documents
Class Notes CS Creating and Using a Huffman Code. Ref: Weiss, page 433

Binary Heaps. CSE 373 Data Structures

Cpt S 223. School of EECS, WSU

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

Ordered Lists and Binary Trees

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

Data Structures and Algorithms

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

From Last Time: Remove (Delete) Operation

Analysis of Algorithms I: Binary Search Trees

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

Data Structures. Jaehyun Park. CS 97SI Stanford University. June 29, 2015

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

Binary Heap Algorithms

Binary Trees and Huffman Encoding Binary Search Trees

Data Structures and Algorithm Analysis (CSC317) Intro/Review of Data Structures Focus on dynamic sets

International Journal of Advanced Research in Computer Science and Software Engineering

Lecture Notes on Binary Search Trees

CS711008Z Algorithm Design and Analysis

Symbol Tables. Introduction

6 March Array Implementation of Binary Trees

Output: struct treenode{ int data; struct treenode *left, *right; } struct treenode *tree_ptr;

Data Structure [Question Bank]

Lecture 6: Binary Search Trees CSCI Algorithms I. Andrew Rosenberg

Questions 1 through 25 are worth 2 points each. Choose one best answer for each.

A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property

A binary search tree is a binary tree with a special property called the BST-property, which is given as follows:

Heaps & Priority Queues in the C++ STL 2-3 Trees

Data Structures Fibonacci Heaps, Amortized Analysis

Algorithms Chapter 12 Binary Search Trees

Sample Questions Csci 1112 A. Bellaachia

THE SECURITY AND PRIVACY ISSUES OF RFID SYSTEM

Analysis of Algorithms I: Optimal Binary Search Trees

Binary Search Trees (BST)

Big Data and Scripting. Part 4: Memory Hierarchies

MAX = 5 Current = 0 'This will declare an array with 5 elements. Inserting a Value onto the Stack (Push)

Lecture Notes on Binary Search Trees

DATA STRUCTURES USING C

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Learning Outcomes. COMP202 Complexity of Algorithms. Binary Search Trees and Other Search Trees

Quiz 4 Solutions EECS 211: FUNDAMENTALS OF COMPUTER PROGRAMMING II. 1 Q u i z 4 S o l u t i o n s

Operations: search;; min;; max;; predecessor;; successor. Time O(h) with h height of the tree (more on later).

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.

EE602 Algorithms GEOMETRIC INTERSECTION CHAPTER 27

Data Structures, Practice Homework 3, with Solutions (not to be handed in)

Converting a Number from Decimal to Binary

CSE 326: Data Structures B-Trees and B+ Trees

Chapter 14 The Binary Search Tree

Data Structure and Algorithm I Midterm Examination 120 points Time: 9:10am-12:10pm (180 minutes), Friday, November 12, 2010

TREE BASIC TERMINOLOGIES

How To Create A Tree From A Tree In Runtime (For A Tree)

Optimal Binary Search Trees Meet Object Oriented Programming

PES Institute of Technology-BSC QUESTION BANK

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

International Journal of Software and Web Sciences (IJSWS)

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

Outline. Introduction Linear Search. Transpose sequential search Interpolation search Binary search Fibonacci search Other search techniques

ECE 250 Data Structures and Algorithms MIDTERM EXAMINATION /5:15-6:45 REC-200, EVI-350, RCH-106, HH-139

Algorithms and Data Structures

The following themes form the major topics of this chapter: The terms and concepts related to trees (Section 5.2).

B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers

Arithmetic Coding: Introduction

S. Muthusundari. Research Scholar, Dept of CSE, Sathyabama University Chennai, India Dr. R. M.

Algorithms and Data Structures Written Exam Proposed SOLUTION

Rotation Operation for Binary Search Trees Idea:

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

Unordered Linked Lists

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D.

Algorithms and Data Structures

Physical Data Organization

Data Structures and Algorithms(5)

Introduction to Algorithms March 10, 2004 Massachusetts Institute of Technology Professors Erik Demaine and Shafi Goldwasser Quiz 1.

Binary Search Trees CMPSC 122

Hash Tables. Computer Science E-119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Data Dictionary Revisited

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

1 Abstract Data Types Information Hiding

Section IV.1: Recursive Algorithms and Recursion Trees

Review of Hashing: Integer Keys

CIS 631 Database Management Systems Sample Final Exam

Alex. Adam Agnes Allen Arthur

Binary Search Trees 3/20/14

10CS35: Data Structures Using C

root node level: internal node edge leaf node Data Structures & Algorithms McQuain

Data Structures and Data Manipulation

Node-Based Structures Linked Lists: Implementation

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

Data Structure with C

CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 221] edge. parent

R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants

DATABASE DESIGN - 1DL400

A TOOL FOR DATA STRUCTURE VISUALIZATION AND USER-DEFINED ALGORITHM ANIMATION

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

Image Compression through DCT and Huffman Coding Technique

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

To My Parents -Laxmi and Modaiah. To My Family Members. To My Friends. To IIT Bombay. To All Hard Workers

Introduction to Data Structures and Algorithms

Binary Search Trees. basic implementations randomized BSTs deletion in BSTs

A Comparison of Dictionary Implementations

Introduction to Programming (in C++) Loops. Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. of Computer Science, UPC

Transcription:

CSE 100: HUFFMAN CODES

READING QUIZ NO TALKING NO NOTES Q1: What do the symbol frequencies used in designing optimal codes represent? A. The frequency of occurrence of symbols in the source file to be encoded B. The probability of searching for a symbol in the source file containing the symbols C. The inverse of the code length for each symbol D. The lower bound on the average code length

READING QUIZ NO TALKING NO NOTES Q2: (True or False) Given a set of symbols, the best possible average code length is minimum when the frequency of occurrence of all symbols is uniformly distributed. A. True B. False

READING QUIZ NO TALKING NO NOTES Q3: (True or False) Prefix codes can always be uniquely decoded. A. True B. False

READING QUIZ NO TALKING NO NOTES Q4: Which of the following indicates that a code is NOT a prefix code? A. The binary tree representation of the code is not balanced B. In the binary tree representation of the code, all the symbols appear as leaf nodes C. In the binary tree representation of the code, one or more symbols appear as intermediate nodes (nodes with at least one child)

Code A Symbol Codeword S 00 P 01 A 10 M 11 Corresponding Binary Tree Code B Symbol Codeword S 0 P 1 A 10 M 11 Code C Symbol Codeword S 0 P 10 A 110 M 111

Problem Definition (revisited) Input: The frequency (p i ) of occurrence of each symbol (S i ) Output: Binary tree T that minimizes the following objective function: i=1:n L(T ) = p i Depth(S i in T ) Solution: Huffman Codes

The David Huffman Story! map smppam ssampamsmam TEXT FILE Le/er freq s 0.6 p 0.2 a 0.1 m 0.1 Huffman coding is one of the fundamental ideas that people in computer science and data communica5ons are using all the 5me - Donald Knuth

Not quite Huffman s algorithm The basic idea is to put the frequent items near the root (short codes) and the less frequent at the leaves. A simple idea is the top-down approach: A B C G H A: 6; B: 4; C: 4; D: 0; E: 0; F: 0; G: 1; H: 2 AAAAABBAHHBCBGCCC

Not quite Huffman s algorithm The basic idea is to put the frequent items near the root (short codes) and the less frequent at the leaves. A simple idea is the top-down approach: A B C G H A: 6; B: 4; C: 4; D: 0; E: 0; F: 0; G: 1; H: 2 Pre/y good, but NOT opmmal!

Huffman s algorithm: Bottom up construction Build the tree from the bottom up! Start with a forest of trees, all with just one node 6 4 4 0 0 0 1 2 A B C D E F G H

Huffman s algorithm: Bottom up construction Build the tree from the bottom up! Start with a forest of trees, all with just one node Merge trees in the forest two at a time to get a single tree 6 4 4 1 2 A B C G H

Huffman s algorithm: Bottom up construction Build the tree from the bottom up! Start with a forest of trees, all with just one node Merge trees in the forest two at a time to get a single tree What should be the merge criterion? 6 4 4 1 2 A B C G H

Huffman s algorithm: Bottom up construction T1 6 4 4 1 2 A B C G H T1 now represents the meta symbol GH What is the count associated with T1? A. Max(count (G), count (H)) B. (count (G) + count (H))/2 C. (count (G) + count (H))

Huffman s algorithm: Bottom up construction Choose the two smallest trees in the forest and merge them Repeat until all nodes are in the tree 7 T2 T1 C 6 4 A B G H

Huffman s algorithm: Bottom up construction Build the tree from the bottom up! Start with a forest of trees, all with just one node Choose the two smallest trees in the forest and merge them Repeat until all nodes are in the tree 17 T4 T3 T2 A B T1 C G H

You Try It! Letter Count u 40 c 20 s 15 d 15 y 6 a 4 Build the tree and write down the codes for each of the symbols Then encode the string cya using this code Rules for building the tree in a deterministic way:

Huffman s algorithm: Building the Huffman Tree 18 0. Determine the count of each symbol in the input message. 1. Create a forest of single-node trees containing symbols and counts for each non-zero-count symbol. 2. Loop while there is more than 1 tree in the forest: 2a. Remove the two lowest count trees 2b. Combine these two trees into a new tree (summing their counts). 2c. Insert this new tree in the forest, and go to 2. 3. Return the one tree in the forest as the Huffman code tree.

Huffman Algorithm: Forest of Trees 19 T1 6 4 4 1 2 A B C G H What is a good data structure to use to hold the forest of trees? A. BST B. Sorted array C. Linked list D. Something else

Huffman Algorithm: Forest of Trees 20 T1 6 4 4 1 2 A B C G H What is a good data structure to use to hold the forest of trees? A. BST: Supports min, insert and delete in O(log N) B. Sorted array: Not good for dynamic data C. Linked list: If unordered then good for insert (constant time) but min would be O(N). If ordered then delete, min are constant time but insert would be O(N) D. Something else: Heap (new data structure?)

What is a Heap? 21 Think of a Heap as a binary tree that is as complete as possible and satisfies the following property: At every node x Key[x]<= Key[children of x] So the root has the value

22 Heap vs. BST vs. Sorted Array Operations BST (Balanced) Sorted Array Heap Search O(log N) O(log N) Selection O(log N) O(1) Min and Max O(log N) O(1) Min or Max O(log N) O(1) Predecessor/ Successor O(log N) O(1) Rank O(log N) O(log N) Output in sorted order O(N) O(N) Insert O(log N) O(N) Delete O(log N) O(N) Extract min or extract max Ref: Tim Roughgarden (Stanford)

The suitability of Heap for our problem 23 In the Huffman problem we are doing repeated inserts and extract-min! Perfect setting to use a Heap data structure. The C++ STL container class: priority_queue has a Heap implementation. Priority Queue and Heap are synonymous

Priority Queues in C++ A C++ priority_queue is a generic container, and can hold any kind of thing as specified with a template parameter when it is created: for example HCNodes, or pointers to HCNodes, etc. 24 #include <queue> std::priority_queue<hcnode> p; You can extract object of highest priority in O(log N) To determine priority: objects in a priority queue must be comparable to each other By default, a priority_queue<t> uses operator< defined for objects of type T: if a < b, b is taken to have higher priority than a

25 Priority Queues in C++ The C++ priority_queue is synonymous to which of the following Heap data structures: A. Max-Heap B. Min-Heap C. BST D. Sorted Array

Priority Queues in C++ 26 #ifndef HCNODE_HPP #define HCNODE_HPP class HCNode { public: HCNode* parent; // pointer to parent; null if root HCNode* child0; // pointer to "0" child; null if leaf HCNode* child1; // pointer to "1" child; null if leaf unsigned char symb; // symbol int count; // count/frequency of symbols in subtree // for less-than comparisons between HCNodes bool operator<(hcnode const &) const; }; #endif

27 In HCNode.cpp: #include HCNODE_HPP /** Compare this HCNode and other for priority ordering. * Smaller count means higher priority. */ bool HCNode::operator<(HCNode const & other) const { // if counts are different, just compare counts return count > other.count; }; #endif What is wrong with this implementation? A. Nothing B. It is non-deterministic (in our algorithm) C. It returns the opposite of the desired value for our purpose

28 In HCNode.cpp: #include HCNODE_HPP /** Compare this HCNode and other for priority ordering. * Smaller count means higher priority. * Use node symbol for deterministic tiebreaking */ bool HCNode::operator<(HCNode const & other) const { // if counts are different, just compare counts if(count!= other.count) return count > other.count; // counts are equal. use symbol value to break tie. // (for this to work, internal HCNodes // must have symb set.) return symb < other.symb; }; #endif