4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

Similar documents
Full and Complete Binary Trees

Class Notes CS Creating and Using a Huffman Code. Ref: Weiss, page 433

Binary Trees and Huffman Encoding Binary Search Trees

How To Create A Tree From A Tree In Runtime (For A Tree)

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

International Journal of Advanced Research in Computer Science and Software Engineering

Algorithms and Data Structures

From Last Time: Remove (Delete) Operation

A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property

Binary Heaps. CSE 373 Data Structures

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

Binary Search Trees (BST)

Binary Search Trees CMPSC 122

B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers

THE SECURITY AND PRIVACY ISSUES OF RFID SYSTEM

Big Data and Scripting. Part 4: Memory Hierarchies

Binary Heap Algorithms

A binary search tree is a binary tree with a special property called the BST-property, which is given as follows:

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

CSE 326: Data Structures B-Trees and B+ Trees

Analysis of Algorithms I: Optimal Binary Search Trees

Learning Outcomes. COMP202 Complexity of Algorithms. Binary Search Trees and Other Search Trees

Data Structures Fibonacci Heaps, Amortized Analysis

Ordered Lists and Binary Trees

Lecture 18: Applications of Dynamic Programming Steven Skiena. Department of Computer Science State University of New York Stony Brook, NY

CS711008Z Algorithm Design and Analysis

Symbol Tables. Introduction

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

Algorithms Chapter 12 Binary Search Trees

Analysis of Algorithms I: Binary Search Trees

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

Heaps & Priority Queues in the C++ STL 2-3 Trees

Lecture 1: Course overview, circuits, and formulas

Algorithms and Data Structures

Introduction to Learning & Decision Trees

Arithmetic Coding: Introduction

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

GRAPH THEORY LECTURE 4: TREES

root node level: internal node edge leaf node Data Structures & Algorithms McQuain

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

6 March Array Implementation of Binary Trees

Binary Coded Web Access Pattern Tree in Education Domain

Converting a Number from Decimal to Binary

Data Structures and Algorithms Written Examination

Data Structures and Algorithms

Data Mining Classification: Decision Trees

Lecture 10: Regression Trees

Information, Entropy, and Coding

Binary Search Trees 3/20/14

MAC Sublayer. Abusayeed Saifullah. CS 5600 Computer Networks. These slides are adapted from Kurose and Ross

Dynamic Programming. Lecture Overview Introduction

The following themes form the major topics of this chapter: The terms and concepts related to trees (Section 5.2).

Lecture 5 - CPA security, Pseudorandom functions

Questions 1 through 25 are worth 2 points each. Choose one best answer for each.

Fundamental Algorithms

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

TREE BASIC TERMINOLOGIES

ER E P M A S S I CONSTRUCTING A BINARY TREE EFFICIENTLYFROM ITS TRAVERSALS DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A

Data Structures. Jaehyun Park. CS 97SI Stanford University. June 29, 2015

Scalable Prefix Matching for Internet Packet Forwarding

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

CS473 - Algorithms I


Binary Search Trees. Ric Glassey

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Data Mining with R. Decision Trees and Random Forests. Hugh Murrell

Physical Data Organization

OPTIMAL BINARY SEARCH TREES

Lecture 6: Binary Search Trees CSCI Algorithms I. Andrew Rosenberg

Mathematical Induction. Lecture 10-11

Lecture 1: Data Storage & Index

Union-Find Problem. Using Arrays And Chains

Approximation Algorithms

Rotation Operation for Binary Search Trees Idea:

Operations: search;; min;; max;; predecessor;; successor. Time O(h) with h height of the tree (more on later).

Professor Anita Wasilewska. Classification Lecture Notes

Reading.. IMAGE COMPRESSION- I IMAGE COMPRESSION. Image compression. Data Redundancy. Lossy vs Lossless Compression. Chapter 8.

Image Compression through DCT and Huffman Coding Technique

Optimal Binary Search Trees Meet Object Oriented Programming

CSC148 Lecture 8. Algorithm Analysis Binary Search Sorting

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Data Structures and Algorithms(5)

Web Data Extraction: 1 o Semestre 2007/2008

encoding compression encryption

Lecture Notes on Binary Search Trees

Sample Questions Csci 1112 A. Bellaachia

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

DESIGN AND ANALYSIS OF ALGORITHMS

6.2 Permutations continued

Introduction Advantages and Disadvantages Algorithm TIME COMPLEXITY. Splay Tree. Cheruku Ravi Teja. November 14, 2011

Cpt S 223. School of EECS, WSU

Algorithms and Data S tructures Structures Stack, Queues, and Applications Applications Ulf Leser

COMPUTER SCIENCE. Paper 1 (THEORY)

Gambling and Data Compression

The ADT Binary Search Tree

Decision-Tree Learning

The Goldberg Rao Algorithm for the Maximum Flow Problem

Plaxton Routing. - From a peer - to - peer network point of view. Lars P. Wederhake

Transcription:

4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd

Data Compression Q. Given a text that uses 32 symbols (26 different letters, space, and some punctuation characters), how can we encode this text in bits? Q. Some symbols (e, t, a, o, i, n) are used far more often than others. How can we use this to reduce our encoding? Q. How do we know when the next symbol begins? Ex. c(a) = What is? c(b) = c(e) = 2

Data Compression Q. Given a text that uses 32 symbols (26 different letters, space, and some punctuation characters), how can we encode this text in bits? A. We can encode 2 5 different symbols using a fixed length of 5 bits per symbol. This is called fixed length encoding. Q. Some symbols (e, t, a, o, i, n) are used far more often than others. How can we use this to reduce our encoding? A. Encode these characters with fewer bits, and the others with more bits. Q. How do we know when the next symbol begins? A. Use a separation symbol (like the pause in Morse), or make sure that there is no ambiguity by ensuring that no code is a prefix of another one. Ex. c(a) = What is? c(b) = c(e) = 3

Prefix Codes Definition. A prefix code for a set S is a function c that maps each x S to s and s in such a way that for x,y S, x y, c(x) is not a prefix of c(y). Ex. c(a) = c(e) = c(k) = c(l) = c(u) = Q. What is the meaning of? Suppose frequencies are known in a text of G: f a =.4, f e =.2, f k =.2, f l =., f u =. Q. What is the size of the encoded text? 4

Prefix Codes Definition. A prefix code for a set S is a function c that maps each x S to s and s in such a way that for x,y S, x y, c(x) is not a prefix of c(y). Ex. c(a) = c(e) = c(k) = c(l) = c(u) = Q. What is the meaning of? A. leuk Suppose frequencies are known in a text of G: f a =.4, f e =.2, f k =.2, f l =., f u =. Q. What is the size of the encoded text? A. 2*f a + 2*f e + 3*f k + 2*f l + 4*f u = 2.4G 5

Optimal Prefix Codes Definition. The average bits per letter of a prefix code c is the sum over all symbols of its frequency times the number of bits of its encoding:! ABL ( c) = f # c( x) x" S x We would like to find a prefix code that is has the lowest possible average bits per letter. Suppose we model a code in a binary tree 6

Representing Prefix Codes using Binary Trees Ex. c(a) = c(e) = c(k) = c(l) = c(u) = e l a u k Q. How does the tree of a prefix code look? 7

Representing Prefix Codes using Binary Trees Ex. c(a) = c(e) = c(k) = c(l) = c(u) = e l a u k Q. How does the tree of a prefix code look? A. Only the leaves have a label. Pf. An encoding of x is a prefix of an encoding of y if and only if the path of x is a prefix of the path of y. 8

Representing Prefix Codes using Binary Trees Q. What is the meaning of?! ABL ( T ) = f x # depth ( x) x" S T e i l m s p 9

Representing Prefix Codes using Binary Trees Q. What is the meaning of A. simpel?! ABL ( T ) = f x # depth ( x) x" S T e i l m Q. How can this prefix code be made more efficient? s p

Representing Prefix Codes using Binary Trees Q. What is the meaning of A. simpel?! ABL ( T ) = f x # depth ( x) x" S T e i l m s Q. How can this prefix code be made more efficient? A. Change encoding of p and s to a shorter one. This tree is now full. s p

Representing Prefix Codes using Binary Trees Definition. A tree is full if every node that is not a leaf has two children. Claim. The binary tree corresponding to the optimal prefix code is full. Pf. w u v 2

Representing Prefix Codes using Binary Trees Definition. A tree is full if every node that is not a leaf has two children. Claim. The binary tree corresponding to the optimal prefix code is full. Pf. (by contradiction) Suppose T is binary tree of optimal prefix code and is not full. This means there is a node u with only one child v. Case : u is the root; delete u and use v as the root w Case 2: u is not the root let w be the parent of u delete u and make v be a child of w in place of u u In both cases the number of bits needed to encode any leaf in the subtree of v is decreased. The rest of the tree is not affected. Clearly this new tree T has a smaller ABL than T. Contradiction. v 3

Optimal Prefix Codes: False Start Q. Where in the tree of an optimal prefix code should letters be placed with a high frequency? 4

Optimal Prefix Codes: False Start Q. Where in the tree of an optimal prefix code should letters be placed with a high frequency? A. Near the top. Greedy template. Create tree top-down, split S into two sets S and S 2 with (almost) equal frequencies. Recursively build tree for S and S 2. [Shannon-Fano, 949] f a =.32, f e =.25, f k =.2, f l =.8, f u =.5 e l a.25.8.32.2.25.32 k e a u k u.5.2.5 l.8 5

Optimal Prefix Codes: Huffman Encoding Observation. Lowest frequency items should be at the lowest level in tree of optimal prefix code. Observation. For n >, the lowest level always contains at least two leaves. Observation. The order in which items appear in a level does not matter. Claim. There is an optimal prefix code with tree T* where the two lowest-frequency letters are assigned to leaves that are siblings in T*. Greedy template. [Huffman, 952] Create tree bottom-up. Make two leaves for two lowest-frequency letters y and z. Recursively build tree for the rest using a meta-letter for yz. 6

Optimal Prefix Codes: Huffman Encoding Huffman(S) { if S =2 { return tree with root and 2 leaves } else { let y and z be lowest-frequency letters in S S = S remove y and z from S insert new letter ω in S with f ω =f y +f z T = Huffman(S ) T = add two children y and z to leaf ω from T return T } } Q. What is the time complexity? 7

Optimal Prefix Codes: Huffman Encoding Huffman(S) { if S =2 { return tree with root and 2 leaves } else { let y and z be lowest-frequency letters in S S = S remove y and z from S insert new letter ω in S with f ω =f y +f z T = Huffman(S ) T = add two children y and z to leaf ω from T return T } } Q. What is the time complexity? A. T(n) = T(n-) + O(n) so O(n 2 ) Q. How to implement finding lowest-frequency letters efficiently? A. Use priority queue for S: T(n) = T(n-) + O(log n) so O(n log n) 8

Huffman Encoding: Greedy Analysis Claim. Huffman code for S achieves the minimum ABL of any prefix code. Pf. by induction, based on optimality of T (y and z removed, ω added) (see next page) Claim. ABL(T )=ABL(T)-f ω Pf. 9

Huffman Encoding: Greedy Analysis Claim. Huffman code for S achieves the minimum ABL of any prefix code. Pf. by induction, based on optimality of T (y and z removed, ω added) (see next page) Claim. ABL(T )=ABL(T)-f ω Pf. ABL(T) = $ f x " depth T (x) x#s = f y "depth T (y)+ f z " depth T (z)+ $ f x "depth T (x) x#s,x%y,z = ( f y + f z )" ( + depth T (&))+ $ f x "depth T (x) x#s,x%y,z = f & " ( + depth T (&))+ $ f x " depth T (x) x#s,x%y,z = f & + $ f x " depth T ' (x) = x#s' f & + ABL(T' ) 2

Huffman Encoding: Greedy Analysis Claim. Huffman code for S achieves the minimum ABL of any prefix code. Pf. (by induction over n= S ) 2

Huffman Encoding: Greedy Analysis Claim. Huffman code for S achieves the minimum ABL of any prefix code. Pf. (by induction over n= S ) Base: For n=2 there is no shorter code than root and two leaves. Hypothesis: Suppose Huffman tree T for S of size n- with ω instead of y and z is optimal. Step: (by contradiction) 22

Huffman Encoding: Greedy Analysis Claim. Huffman code for S achieves the minimum ABL of any prefix code. Pf. (by induction) Base: For n=2 there is no shorter code than root and two leaves. Hypothesis: Suppose Huffman tree T for S of size n- with ω instead of y and z is optimal. (IH) Step: (by contradiction) Idea of proof: Suppose other tree Z of size n is better. Delete lowest frequency items y and z from Z creating Z Z cannot be better than T by IH. 23

Huffman Encoding: Greedy Analysis Claim. Huffman code for S achieves the minimum ABL of any prefix code. Pf. (by induction) Base: For n=2 there is no shorter code than root and two leaves. Hypothesis: Suppose Huffman tree T for S with ω instead of y and z is optimal. (IH) Step: (by contradiction) Suppose Huffman tree T for S is not optimal. So there is some tree Z such that ABL(Z) < ABL(T). Then there is also a tree Z for which leaves y and z exist that are siblings and have the lowest frequency (see observation). Let Z be Z with y and z deleted, and their former parent labeled ω. Similar T is derived from S in our algorithm. We know that ABL(Z )=ABL(Z)-f ω, as well as ABL(T )=ABL(T)-f ω. But also ABL(Z) < ABL(T), so ABL(Z ) < ABL(T ). Contradiction with IH. 24