CSE 143. Lecture 24: Priority Queues and Huffman Encoding

Similar documents
Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

Class Notes CS Creating and Using a Huffman Code. Ref: Weiss, page 433

Binary Trees and Huffman Encoding Binary Search Trees

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Binary Heap Algorithms

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

Binary Heaps. CSE 373 Data Structures

International Journal of Advanced Research in Computer Science and Software Engineering

CSE 326: Data Structures B-Trees and B+ Trees

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

Image Compression through DCT and Huffman Coding Technique

Chapter 14 The Binary Search Tree

10CS35: Data Structures Using C

Operations: search;; min;; max;; predecessor;; successor. Time O(h) with h height of the tree (more on later).

Data Structures. Jaehyun Park. CS 97SI Stanford University. June 29, 2015

Ordered Lists and Binary Trees

6 March Array Implementation of Binary Trees

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.

Arithmetic Coding: Introduction

PES Institute of Technology-BSC QUESTION BANK

CS 2112 Spring Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions

Review of Hashing: Integer Keys

1. The memory address of the first element of an array is called A. floor address B. foundation addressc. first address D.

Data Structures and Algorithms

Information, Entropy, and Coding

Questions 1 through 25 are worth 2 points each. Choose one best answer for each.

Algorithms and Data Structures

Cpt S 223. School of EECS, WSU

From Last Time: Remove (Delete) Operation

Data Structure and Algorithm I Midterm Examination 120 points Time: 9:10am-12:10pm (180 minutes), Friday, November 12, 2010

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015

A Comparison of Dictionary Implementations

Converting a Number from Decimal to Binary

Big Data and Scripting. Part 4: Memory Hierarchies

The following themes form the major topics of this chapter: The terms and concepts related to trees (Section 5.2).

Sample Questions Csci 1112 A. Bellaachia

Binary Search Trees. Data in each node. Larger than the data in its left child Smaller than the data in its right child

Symbol Tables. Introduction

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Binary Search Trees CMPSC 122

External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13

Lecture 6: Binary Search Trees CSCI Algorithms I. Andrew Rosenberg

Analysis of Algorithms I: Optimal Binary Search Trees

Previous Lectures. B-Trees. External storage. Two types of memory. B-trees. Main principles

encoding compression encryption

Analysis of a Search Algorithm

Heaps & Priority Queues in the C++ STL 2-3 Trees

A Catalogue of the Steiner Triple Systems of Order 19

External Sorting. Chapter 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Binary Search Trees. Ric Glassey

TREE BASIC TERMINOLOGIES

Wan Accelerators: Optimizing Network Traffic with Compression. Bartosz Agas, Marvin Germar & Christopher Tran

CSE373: Data Structures and Algorithms Lecture 1: Introduction; ADTs; Stacks/Queues. Linda Shapiro Spring 2016

Algorithms and Data Structures Written Exam Proposed SOLUTION

Data Structures and Algorithm Analysis (CSC317) Intro/Review of Data Structures Focus on dynamic sets

A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property

Learning Outcomes. COMP202 Complexity of Algorithms. Binary Search Trees and Other Search Trees

Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

Data Structure with C

CSE 2123 Collections: Sets and Iterators (Hash functions and Trees) Jeremy Morris

CompSci-61B, Data Structures Final Exam

Binary search tree with SIMD bandwidth optimization using SSE

1.00 Lecture 1. Course information Course staff (TA, instructor names on syllabus/faq): 2 instructors, 4 TAs, 2 Lab TAs, graders

Binary Search Trees (BST)

How To Create A Tree From A Tree In Runtime (For A Tree)

Analysis of Binary Search algorithm and Selection Sort algorithm

CS711008Z Algorithm Design and Analysis

Tables so far. set() get() delete() BST Average O(lg n) O(lg n) O(lg n) Worst O(n) O(n) O(n) RB Tree Average O(lg n) O(lg n) O(lg n)

The ADT Binary Search Tree

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

DNS LOOKUP SYSTEM DATA STRUCTURES AND ALGORITHMS PROJECT REPORT

Analysis of Compression Algorithms for Program Data

Machine Architecture and Number Systems. Major Computer Components. Schematic Diagram of a Computer. The CPU. The Bus. Main Memory.

Computer Logic (2.2.3)

B-Trees. Algorithms and data structures for external memory as opposed to the main memory B-Trees. B -trees

Data Structures. Level 6 C Module Descriptor

Full and Complete Binary Trees

AP Computer Science AB Syllabus 1

Streaming Lossless Data Compression Algorithm (SLDC)

COMPUTER SCIENCE. Paper 1 (THEORY)

A TOOL FOR DATA STRUCTURE VISUALIZATION AND USER-DEFINED ALGORITHM ANIMATION

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

root node level: internal node edge leaf node Data Structures & Algorithms McQuain

Scalable Prefix Matching for Internet Packet Forwarding

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

Data Structures. Algorithm Performance and Big O Analysis

Analysis of Algorithms I: Binary Search Trees

International Journal of Software and Web Sciences (IJSWS)

Object-Oriented Design Lecture 4 CSU 370 Fall 2007 (Pucella) Tuesday, Sep 18, 2007

1. Define: (a) Variable, (b) Constant, (c) Type, (d) Enumerated Type, (e) Identifier.

Chapter 8: Bags and Sets

Linux Driver Devices. Why, When, Which, How?

Third Southern African Regional ACM Collegiate Programming Competition. Sponsored by IBM. Problem Set

Output: struct treenode{ int data; struct treenode *left, *right; } struct treenode *tree_ptr;

Chapter 3: Restricted Structures Page 1

CSE 123: Computer Networks Fall Quarter, 2014 MIDTERM EXAM

Transcription:

CSE 143 Lecture 24: Priority Queues and Huffman Encoding

Prioritization problems ER scheduling: You are in charge of scheduling patients for treatment in the ER. A gunshot victim should probably get treatment sooner than that one guy with a sore neck, regardless of arrival time. How do we always choose the most urgent case when new patients continue to arrive? print jobs: The CSE lab printers constantly accept and complete jobs from all over the building. Suppose we want them to print faculty jobs before staff before student jobs, and grad students before undergraduate students, etc.? What would be the runtime of solutions to these problems using the data structures we know (list, sorted list, map, set, BST, etc.)? 2

Inefficient structures list : store jobs in a list; remove min/max by searching (O(N)) problem: expensive to search sorted list : store in sorted list; binary search it in O(log N) time problem: expensive to add/remove (O(N)) binary search tree : store in BST, go right for max in O(log N) problem: tree becomes unbalanced 3

Priority queue ADT priority queue: a collection of ordered elements that provides fast access to the minimum (or maximum) element priority queue operations: add adds in order; O(log N) worst peek returns minimum value; O(1) always remove removes/returns minimum value; O(log N) worst isempty, clear, size, iterator O(1) always 4

Java's PriorityQueue class public class PriorityQueue<E> implements Queue<E> Method/Constructor Description Runtime PriorityQueue<E>() constructs new empty queue O(1) add(e value) adds value in sorted order O(log N ) clear() removes all elements O(1) iterator() returns iterator over elements O(1) peek() returns minimum element O(1) remove() removes/returns min element O(log N ) size() number of elements in queue O(1) Queue<String> pq = new PriorityQueue<String>(); pq.add( Stuart"); pq.add( Allison");... 5

Inside a priority queue Usually implemented as a heap, a kind of binary tree. Instead of sorted left right, it's sorted top bottom guarantee: each child is greater (lower priority) than its ancestors add/remove causes elements to "bubble" up/down the tree (take CSE 332 or 373 to learn about implementing heaps!) 10 20 80 40 60 85 90 50 99 65 6

Exercise: Fire the TAs We have decided that novice Tas should all be fired. Write a class TAManager that reads a list of TAs from a file. Find all with 2 quarters experience, and replace them. Print the final list of TAs to the console, sorted by experience. Input format: name quarters Will 3 name quarters Ying 2 name quarters Andrew 1 7

Priority queue ordering For a priority queue to work, elements must have an ordering in Java, this means implementing the Comparable interface Reminder: public class Foo implements Comparable<Foo> { public int compareto(foo other) { // Return positive, zero, or negative integer } } 8

Homework 8 (Huffman Coding) 9

File compression compression: Process of encoding information in fewer bits. But isn't disk space cheap? Compression applies to many things: store photos without exhausting disk space reduce the size of an e-mail attachment make web pages smaller so they load faster reduce media sizes (MP3, DVD, Blu-Ray) make voice calls over a low-bandwidth connection (cell, Skype) Common compression programs: WinZip or WinRAR for Windows Stuffit Expander for Mac 10

ASCII encoding ASCII: Mapping from characters to integers (binary bits). Maps every possible character to a number ('A' 65) uses one byte (8 bits) for each character most text files on your computer are in ASCII format Char ASCII value ASCII (binary) ' ' 32 00100000 'a' 97 01100001 'b' 98 01100010 'c' 99 01100011 'e' 101 01100101 'z' 122 01111010 11

Huffman encoding Huffman encoding: Uses variable lengths for different characters to take advantage of their relative frequencies. Some characters occur more often than others. If those characters use < 8 bits each, the file will be smaller. Other characters need > 8, but that's OK; they're rare. Char ASCII value ASCII (binary) Hypothetical Huffman ' ' 32 00100000 10 'a' 97 01100001 0001 'b' 98 01100010 01110100 'c' 99 01100011 001100 'e' 101 01100101 1100 'z' 122 01111010 00100011110 12

Huffman's algorithm The idea: Create a "Huffman Tree" that will tell us a good binary representation for each character. Left means 0, right means 1. example: 'b' is 10 More frequent characters will be "higher" in the tree (have a shorter binary value). To build this tree, we must do a few steps first: Count occurrences of each unique character in the file. Use a priority queue to order them from least to most frequent. 13

Huffman compression 1. Count the occurrences of each character in file {' '=2, 'a'=3, 'b'=3, 'c'=1, EOF=1} 2. Place characters and counts into priority queue 3. Use priority queue to create Huffman tree 4. Traverse tree to find (char binary) map {' '=00, 'a'=11, 'b'=10, 'c'=010, EOF=011} 5. For each char in file, convert to compressed binary version 11 10 00 11 10 00 010 1 1 10 011 00 14

1) Count characters step 1: count occurrences of characters into a map example input file contents: ab ab cab byte 1 2 3 4 5 6 7 8 9 char 'a' 'b' ' ' 'a' 'b' ' ' 'c' 'a' 'b' ASCII 97 98 32 97 98 32 99 97 98 binary 01100001 01100010 00100000 01100001 01100010 00100000 01100011 01100001 01100010 counts array: (in HW8, we do this part for you) 15

2) Create priority queue step 2: place characters and counts into a priority queue store a single character and its count as a Huffman node object the priority queue will organize them into ascending order 16

3) Build Huffman tree step 2: create "Huffman tree" from the node counts algorithm: Put all node counts into a priority queue. while P.Q. size > 1: Remove two rarest characters. Combine into a single node with these two as its children. 17

Build tree example 18

4) Tree to binary encodings The Huffman tree tells you the binary encodings to use. left means 0, right means 1 example: 'b' is 10 What are the binary encodings of: EOF, ' ', 'c', 'a'? What is the relationship between tree branch height, binary representation length, character frequency, etc.? 19

5) compress the actual file Based on the preceding tree, we have the following encodings: {' '=00, 'a'=11, 'b'=10, 'c'=010, EOF=011} Using this map, we can encode the file into a shorter binary representation. The text ab ab cab would be encoded as: char 'a' 'b' ' ' 'a' 'b' ' ' 'c' 'a' 'b' EOF binary 11 10 00 11 10 00 010 11 10 011 Overall: 1110001110000101110011, (22 bits, ~3 bytes) byte 1 2 3 char a b a b c a b EOF binary 11 10 00 11 10 00 010 1 1 10 011 00 Encode.java does this for us using our codes file. How would we go back in the opposite direction (decompress)? 20

Decompressing How do we decompress a file of Huffman-compressed bits? useful "prefix property" No encoding A is the prefix of another encoding B I.e. never will have x 011 and y 011100110 the algorithm: Read each bit one at a time from the input. If the bit is 0, go left in the tree; if it is 1, go right. If you reach a leaf node, output the character at that leaf and go back to the tree root. 21

Decompressing Use the tree to decompress a compressed file with these bits: 1011010001101011011 b a c _ a c a Read each bit one at a time. If it is 0, go left; if 1, go right. If you reach a leaf, output the character there and go back to the tree root. Output: bac aca 22

Public methods to write public HuffmanTree(int[] counts) Given character frequencies for a file, create Huffman tree (Steps 2-3) public void write(printstream output) Write mappings between characters and binary to a.code file (Step 4) public HuffmanTree(Scanner input) Reconstruct the tree from a.code file public void decode(bitinputstream in, PrintStream out, int eof) Use the Huffman tree to decode characters 23

Bit I/O streams Java's input/output streams read/write 1 byte (8 bits) at a time. We want to read/write one single bit at a time. BitInputStream: Reads one bit at a time from input. public BitInputStream(String file) Creates stream to read bits from given file public int readbit() Reads a single 1 or 0 public void close() Stops reading from the stream BitOutputStream: Writes one bit at a time to output. public BitOutputStream(String file) public void writebit(int bit) public void close() Creates stream to write bits to given file Writes a single bit Stops reading from the stream 24