# Class Notes CS Creating and Using a Huffman Code. Ref: Weiss, page 433

Save this PDF as:

Size: px
Start display at page:

Download "Class Notes CS 3137. 1 Creating and Using a Huffman Code. Ref: Weiss, page 433"

## Transcription

1 Class Notes CS Creating and Using a Huffman Code. Ref: Weiss, page FIXED LENGTH CODES: Codes are used to transmit characters over data links. You are probably aware of the ASCII code, a fixed-length 7 bit binary code that encodes 2 7 characters (you can type man ascii on a unix terminal to see the ascii code). Unicode is the code used in JAVA, which is a 16 bit code. 2. Since transmitting data over data links can be expensive (e.g. satellite transmission, your personal phone bill to use a modem, etc) it makes sense to try to minimize the number of bits sent in total. Its also important in being able to send data faster (i.e in real time). Images, Sound, Video etc. have large data rates that can be reduced by proper coding techniques. A fixed length code doesn t do a good job of reducing the amount of data sent, since some characters in a message appear more frequently than others, but yet require the same number of bits as a very frequent character. 3. Suppose we have a 4 character alphabet (ABCD) and we want to encode the message ABACCDA. Using a fixed length code with 3 bits per character, we need to transmit 7*3=21 bits in total. This is wasteful since we only really need 2 bits to encode the 4 characters ABCD. So if we now use a 2 bit code (A=00, B=01, C=10, D=11), then we only need to transmit 7*2=14 bits of data. We can reduce this even more if we use a code based upon the frequencies of each character. In our example message, A occurs 3 times, C twice, B and D once each. If we generate a variable length code where A=0, B=110, C=10 and D=111, our message above can be transmitted as the bit string: = 13 bits A B A C C D A Notice, that as we process the bits each character s code is unique; there is no ambiguity. As soon as we find a character s code, we start the decoding process again. 4. CREATING A HUFFMAN CODE: The code above is a Huffman code and it is optimal in the sense that it encodes the most frequent characters in a message with the fewest bits, and the least frequent characters with the most bits. 5. It is also a prefix-free code. There can be no code that can encode the message in fewer bits than this one. It also means that no codeword can be a prefix of another code word. This makes decoding a Huffman encoded message particularly easy: as the bits come by, as soon as you see a code word, you are done since no other code word can have this prefix. The general idea behind making a Huffman code is to find the frequencies of the characters being encoded, and then making a Huffman tree based upon these frequencies. 6. The method starts with a set of single nodes, each of which contains a character and its frequency in the text to be encoded. Each of these single nodes is a tree, and the set of all these nodes can be thought of as a forest - a set of trees. We can put these single node trees into a priority queue which is prioritized by decreasing frequency of the characters occurrence. Hence, the node containing the least frequent character will be the root node of the priority queue. 7. Algorithm to make a Huffman tree: 1

2 (a) Scan the message to be encoded, and count the frequency of every character (b) Create a single node tree for each character and its frequency and place into a priority queue (heap) with the lowest frequency at the root (c) Until the forest contains only 1 tree do the following: Remove the two nodes with the minimum frequencies from the priority queue (we now have the 2 least frequent nodes in the tree) Make these 2 nodes children of a new combined node with frequency equal to the sum of the 2 nodes frequencies Insert this new node into the priority queue 8. Huffman coding is a good example of the separation of an Abstract Data Type from its implementation as a data structure in a programmijng language. Conceptually, the idea of a Huffman tree is clear. Its implementation in Java is a bit more challenging. Its always important to remember that programming is oftentimes creating a mapping from the ADT to an algorithm and data structures. Key Point: If you don t understand the problem at the ADT level, you will have little chance of implementing a correct program at the programming language level. A possible Java data structure for a hufftreenode is: public class hufftreenode implements Comparable { public hufftreenode(int i, int freq, hufftreenode l, hufftreenode r){ frequency = freq; symbol = i; left = l; right = r; } public int compareto(object x){ hufftreenode test= (hufftreenode)x; if (frequency>test.frequency) return 1; if (frequency==test.frequency) return 0; return -1; } int symbol; hufftreenode left,right; int frequency; } 9. MAKING A CODE TABLE: Once the Huffman tree is built, you can print it out using the printtree utility. We also need to make a code table from the tree that contains the bitcode for each character. We create this table by traversing the tree and keeping track of the left branches we take (we designate this a 0) and the right branches we take (the opposite bit sense - a 1) until we reach a leaf node. Once we reach a leaf, we have a full code for the character at the leaf node which is simply the string of left and right moves which we have strung together along with the code s length. The output of this procedure is an array, indexed by character (a number between 0 and 127 for a 7 bit ascii character), that contains an entry for the coded bits and the code s length. 2

3 Figure 1: Huffman Coding Tree for string: big brown book Here is a test run of a Huffman Coding program Here is the original original message read from input: big brown book Here is the Code Table CHAR ASCII FREQ BIT PATTERN CODELENGTH b g i k n o r w Here is the coded message, b i g _ b r o w n _ b o o k 14 characters in 42 bits, 3 bits per character. 3

4 2 How to Write a Program to Make a Huffman Code Tree To make a Huffman Code tree, you will need to use a variety of data structures: 1. An array that encodes character frequencies: FREQ ARRAY 2. A set of binary tree nodes that holds each character and its frequency: HUFF TREE NODES 3. A priority queue (a heap) that allows us to pick the 2 minimum frequency nodes: HEAP 4. A binary code tree formed from the HUFF TREE NODES: This is the HUFFMAN TREE Here is the Algorithm: 1. Read in a test message, separate it into individual characters, and count the frequency of each character. You can store these frequencies in an integer array, FREQ ARRAY, indexed by each characters binary code. FREQ ARRAY is data structure #1. For debugging, you should print this array out to make sure the characters have been correctly read in and counted. 2. You are now ready to create a HEAP, data structure #3. This heap will contain HUFF TREE NODES which are data structure #2. When we make the heap, using the insert method for heaps, we create a HUFF TREE NODE for each individual character and its frequency. The insert uses the character frequency as its key for determining position in the heap. The HUFF TREE NODE as defined in your class notes on Huffman trees implements the Comparable Java interface, so you can compare frequencies for insertion into the HEAP. Remember, you ALREADY HAVE THIS CODE FOR MAKING A HEAP FROM THE WEISS TEXT, GO USE IT! For debugging, you may want to print the heap out and see if its ordered correctly by using repeated calls to deletemin(). Be careful, as this debugging of the heap also destroys the heap, so once you get the heap correct, remove this debugging code! 3. The HUFFMAN TREE, data structure #4, is formed by repeatedly taking the 2 minimum nodes from the HEAP, creating a new node that has as its character frequency the sum of the 2 nodes frequencies, and these nodes will be its children. Then we place this node back into the HEAP using the insert function. 4. The method above takes 2 nodes from the HEAP, and inserts a new node back into the HEAP. Eventually, the HEAP will only have 1 item left, and this is the HUFFMAN TREE, data structure #4. 5. The HUFFMAN TREE is just a binary tree. So all the binary tree operations will work on it. In particular, you can print it out to see if its correct. You can also number the tree nodes with x and y coordinates for printing in a window using the algorithm in exercise 4.38 of Weiss. 3 Proving Optimality of a Huffman Code 1. We can think of code words as belonging to a a binary tree where left and right branches map to a 0 or 1 bit encoding of the code. Every encoded character will be a node on the tree, reachable in a series of left-right (meaning a 0 or 1 bit) moves from the root to the character s node. 4

5 2. Huffman codes are prefix-free codes - every character to be encoded is a leaf node. There can be no node for a character code at an internal node (i.e. non-leaf). Otherwise, we would have characters whose bit code would be a subset of another character s bit code, and we wouldn t be able to know where one character ended and the next began. 3. Since we can establish a one-to-one relation between codes and binary trees, then we have a simple method of determining the optimal code. Since there can only be a finite number of binary code trees, we need to find the optimal prefix-free binary code tree. 4. The measure we use to determine optimality is to be able to encode a specified message with a set of characters and frequency of appearance of each character in the minimum number of bits. If we are using a fixed length code of say 8 bits per character, then a 20 character message will take up 160 bits. Our job is to find a new code who can encode the message in less than 160 bits, and in fact in a minimum number of bits. The key idea is to use a short code for very frequent characters and longer code for infrequent characters. 5. We need to minimize the number of bits for the encoding. A related measure is the Weighted Path Length (WPL) for a code tree. W P L = N F i L i, Num of Bits = i=1 N N F i L i (1) i=1 Let F i be the frequency of character i to be encoded (expressed as a fraction). Let L i be the length in bits of the code word for character i. We can recognize that the length L i is simply the number of levels deep for each character in the binary code tree. Each 0 or 1 in the code word is equivalent to going to the next level of the tree. So a 3 bit code 010 would have a leaf at the third level of the tree which is the code for that character. Num of Bits just adds up each character s contribution to the total length of the encoded message based upon the frequency of occurrence. N F i is simply the number of occurrences of character i in a message of N characters, and L i is the length of the code for that character in bits. Since N is a constant, we can delete it to get a more compact measure WPL. 6. To minimize WPL, we can only affect the term L i which is the length of the code for each character i. We need to show how to build a code tree that has minumum Weighted Path Length. We do this proof by induction, and we also introduce a new tree. Define T opt as the optimal code tree that has the minimum WPL. Since there can only be a finite number of code trees, we could build them all and find this one, so it must exist. The core of the proof is to show that a Huffman code tree T huff is identical to this tree, and is therefore also the optimal code tree. 7. We will first prove the basis case for the minimal encoding of 2 characters is identical for T opt and T huff. We will show that the two least frequencies must be the deepest nodes in the tree. For T opt, we know that optimal code tree for two characters must be 2 siblings off of a single root node. To see this, suppose there was an extra node in the tree. This must be an interior node with 1 child. But if we simply moved the single child up to the interior node s position, we would have a shorter code. Hence we cannot have a single child interior node and the 3 node tree (root and 2 siblings) is optimal. For T huff, we can see that the construction process of the Huffman algorithm will generate exactly the same tree for a 2 character code. So we now have proved that T opt and T huff are identical for a 2 character code. NOTE: we can actually only show that they are identical up to a transposition of 0 s or 1 s since we have a choice of 5

6 which node is a left child and which is a right child. However, the code length L i will be the same, and this is what we are trying to minimize. 8. Now, perform the induction step. We assume that for a code of N characters, T opt and T huff are identical. Now show that it is true for a code of N + 1 characters. 9. In the trees of N + 1 characters, we know that the two characters with the smallest frequencies F 1 and F 2 are siblings at the deepest node by the basis case. Now, take these two deepest leaves which are siblings off of each tree T opt and T huff. Replace these two leaves with a single node that encodes the sum of the frequencies of these two characters, F 1 + F 2. We can then think of these two characters as a single meta-character encoding characters 1 and 2. We can define the new trees Topt and Thuff which are the trees which encode N characters formed by removing the two deepest sibling leaves and replacing them with a single node. So we can now recalculate the WPL for both T opt and T huff : W P L(T opt ) = W P L(T opt) + (F 1 L 1 ) + (F 2 L 2 ) (2) W P L(T huff ) = W P L(Thuff ) + (F 1 L 1 ) + (F 2 L 2 ) (3) Since T opt is defined to be the minimum cost tree for N + 1 characters, we are assuming that: W P L(T opt ) < W P L(T huff ) (4) substituting (2) and (3) into (4) we get: W P L(T opt) + (F 1 L 1 ) + (F 2 L 2 ) < W P L(T huff ) + (F 1 L 1 ) + (F 2 L 2 ) (5) W P L(Topt) < W P L(Thuff ) (6) However we assumed in the induction step that T huff was minimal cost on N nodes, and this is exactly W P L(Thuff ). Therefore, it cannot be true that W P L(T opt) < W P L(Thuff ), and therefore we have shown that the Huffman code tree on N + 1 nodes is of no less cost than T opt, and must also be minimal. 6

### Chap 3 Huffman Coding

Chap 3 Huffman Coding 3.1 Overview 3.2 The Huffman Coding Algorithm 3.4 Adaptive Huffman Coding 3.5 Golomb Codes 3.6 Rice Codes 3.7 Tunstall Codes 3.8 Applications of Huffman Coding 1 3.2 The Huffman Coding

### International Journal of Advanced Research in Computer Science and Software Engineering

Volume 3, Issue 7, July 23 ISSN: 2277 28X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Greedy Algorithm:

### CSC 310: Information Theory

CSC 310: Information Theory University of Toronto, Fall 2011 Instructor: Radford M. Neal Week 2 What s Needed for a Theory of (Lossless) Data Compression? A context for the problem. What are we trying

### Binary Trees and Huffman Encoding Binary Search Trees

Binary Trees and Huffman Encoding Binary Search Trees Computer Science E119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Motivation: Maintaining a Sorted Collection of Data A data dictionary

### 2. Compressing data to reduce the amount of transmitted data (e.g., to save money).

Presentation Layer The presentation layer is concerned with preserving the meaning of information sent across a network. The presentation layer may represent (encode) the data in various ways (e.g., data

### Information, Entropy, and Coding

Chapter 8 Information, Entropy, and Coding 8. The Need for Data Compression To motivate the material in this chapter, we first consider various data sources and some estimates for the amount of data associated

### Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 14 Non-Binary Huffman Code and Other Codes In the last class we

### Multimedia Communications. Huffman Coding

Multimedia Communications Huffman Coding Optimal codes Suppose that a i -> w i C + is an encoding scheme for a source alphabet A={a 1,, a N }. Suppose that the source letter a 1,, a N occur with relative

### 5.4. HUFFMAN CODES 71

5.4. HUFFMAN CODES 7 Corollary 28 Consider a coding from a length n vector of source symbols, x = (x... x n ), to a binary codeword of length l(x). Then the average codeword length per source symbol for

### Huffman Assignment Intro

Huffman Assignment Intro James Wei Professor Peck April 2, 2014 Slides by James Wei Outline Intro to Huffman The Huffman algorithm Classes of Huffman Understanding the design Implementation Analysis Grading

### Name: Shu Xiong ID:

Homework #1 Report Multimedia Data Compression EE669 2013Spring Name: Shu Xiong ID: 3432757160 Email: shuxiong@usc.edu Content Problem1: Writing Questions... 2 Huffman Coding... 2 Lempel-Ziv Coding...

### Image Compression through DCT and Huffman Coding Technique

International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul

### Compression for IR. Lecture 5. Lecture 5 Information Retrieval 1

Compression for IR Lecture 5 Lecture 5 Information Retrieval 1 IR System Layout Lexicon (w, *occ) Occurrences (d, f t,d ) Locations (d, *pos) Documents 2 Why Use Compression? More storage inverted file

### Almost every lossy compression system contains a lossless compression system

Lossless compression in lossy compression systems Almost every lossy compression system contains a lossless compression system Lossy compression system Transform Quantizer Lossless Encoder Lossless Decoder

### Chapter II Binary Data Representation

Chapter II Binary Data Representation The atomic unit of data in computer systems is the bit, which is actually an acronym that stands for BInary digit. It can hold only 2 values or states: 0 or 1, true

### Full and Complete Binary Trees

Full and Complete Binary Trees Binary Tree Theorems 1 Here are two important types of binary trees. Note that the definitions, while similar, are logically independent. Definition: a binary tree T is full

### Inverted Indexes Compressed Inverted Indexes. Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 40

Inverted Indexes Compressed Inverted Indexes Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 40 Compressed Inverted Indexes It is possible to combine index compression and

### Huffman Coding. National Chiao Tung University Chun-Jen Tsai 10/2/2014

Huffman Coding National Chiao Tung University Chun-Jen Tsai 10/2/2014 Huffman Codes Optimum prefix code developed by D. Huffman in a class assignment Construction of Huffman codes is based on two ideas:

### 1 2-3 Trees: The Basics

CS10: Data Structures and Object-Oriented Design (Fall 2013) November 1, 2013: 2-3 Trees: Inserting and Deleting Scribes: CS 10 Teaching Team Lecture Summary In this class, we investigated 2-3 Trees in

### The relationship of a trees to a graph is very important in solving many problems in Maths and Computer Science

Trees Mathematically speaking trees are a special class of a graph. The relationship of a trees to a graph is very important in solving many problems in Maths and Computer Science However, in computer

### Multi-Way Search Trees (B Trees)

Multi-Way Search Trees (B Trees) Multiway Search Trees An m-way search tree is a tree in which, for some integer m called the order of the tree, each node has at most m children. If n

### Ordered Lists and Binary Trees

Data Structures and Algorithms Ordered Lists and Binary Trees Chris Brooks Department of Computer Science University of San Francisco Department of Computer Science University of San Francisco p.1/62 6-0:

### Encoding Text with a Small Alphabet

Chapter 2 Encoding Text with a Small Alphabet Given the nature of the Internet, we can break the process of understanding how information is transmitted into two components. First, we have to figure out

### character E T A S R I O D frequency

Data Compression Data compression is any process by which a digital (e.g. electronic) file may be transformed to another ( compressed ) file, such that the original file may be fully recovered from the

### A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

Binary Search Trees 1 The general binary tree shown in the previous chapter is not terribly useful in practice. The chief use of binary trees is for providing rapid access to data (indexing, if you will)

### CS/COE 1501

CS/COE 1501 www.cs.pitt.edu/~nlf4/cs1501/ Searching Symbol tables Abstract structures that link keys to values Key is used to search the data structure for a value Described as a class in the text, but

### Heap. Binary Search Tree. Heaps VS BSTs. < el el. Difference between a heap and a BST:

Heaps VS BSTs Difference between a heap and a BST: Heap el Binary Search Tree el el el < el el Perfectly balanced at all times Immediate access to maximal element Easy to code Does not provide efficient

### INTRODUCTION TO CODING THEORY: BASIC CODES AND SHANNON S THEOREM

INTRODUCTION TO CODING THEORY: BASIC CODES AND SHANNON S THEOREM SIDDHARTHA BISWAS Abstract. Coding theory originated in the late 1940 s and took its roots in engineering. However, it has developed and

### Outline BST Operations Worst case Average case Balancing AVL Red-black B-trees. Binary Search Trees. Lecturer: Georgy Gimel farb

Binary Search Trees Lecturer: Georgy Gimel farb COMPSCI 220 Algorithms and Data Structures 1 / 27 1 Properties of Binary Search Trees 2 Basic BST operations The worst-case time complexity of BST operations

### A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called heap-order property

CmSc 250 Intro to Algorithms Chapter 6. Transform and Conquer Binary Heaps 1. Definition A binary heap is a complete binary tree, where each node has a higher priority than its children. This is called

### Theorem A graph T is a tree if, and only if, every two distinct vertices of T are joined by a unique path.

Chapter 3 Trees Section 3. Fundamental Properties of Trees Suppose your city is planning to construct a rapid rail system. They want to construct the most economical system possible that will meet the

### CSE 5311 Homework 2 Solution

CSE 5311 Homework 2 Solution Problem 6.2-6 Show that the worst-case running time of MAX-HEAPIFY on a heap of size n is Ω(lg n). (Hint: For a heap with n nodes, give node values that cause MAX- HEAPIFY

### CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team Lecture Summary In this lecture, we learned about the ADT Priority Queue. A

### CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 34: Memory Management 16 Apr 08

CS412/CS413 Introduction to Compilers Tim Teitelbaum Lecture 34: Memory Management 16 Apr 08 CS 412/413 Spring 2008 Introduction to Compilers 1 Outline Explicit memory management Garbage collection techniques

### Why is the number of 123- and 132-avoiding permutations equal to the number of binary trees?

Why is the number of - and -avoiding permutations equal to the number of binary trees? What are restricted permutations? We shall deal with permutations avoiding some specific patterns. For us, a permutation

### Data Structures Fibonacci Heaps, Amortized Analysis

Chapter 4 Data Structures Fibonacci Heaps, Amortized Analysis Algorithm Theory WS 2012/13 Fabian Kuhn Fibonacci Heaps Lacy merge variant of binomial heaps: Do not merge trees as long as possible Structure:

### 1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

Answer the following 1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++ 2) Which data structure is needed to convert infix notations to postfix notations? Stack 3) The

Adaptive Huffman coding If we want to code a sequence from an unknown source using Huffman coding, we need to know the probabilities of the different symbols. Most straightforward is to make two passes

### Dept. of CSE, IIT KGP

Basic Programming Concepts CS10001: Programming & Data Structures Pallab Dasgupta Professor, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Some Terminologies Algorithm / Flowchart

### Number Representation

Number Representation CS10001: Programming & Data Structures Pallab Dasgupta Professor, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Topics to be Discussed How are numeric data

### 12 Abstract Data Types

12 Abstract Data Types 12.1 Source: Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: Define the concept of an abstract data type (ADT).

### Exercises with solutions (1)

Exercises with solutions (). Investigate the relationship between independence and correlation. (a) Two random variables X and Y are said to be correlated if and only if their covariance C XY is not equal

### Trees and structural induction

Trees and structural induction Margaret M. Fleck 25 October 2010 These notes cover trees, tree induction, and structural induction. (Sections 10.1, 4.3 of Rosen.) 1 Why trees? Computer scientists are obsessed

### Trees LOGO STYLE GUIDE Schools within the University 1 19

Trees 1 Example Tree 2 A Generic Tree 3 Tree ADT n Tree definition q A tree is a set of nodes which may be empty q If not empty, then there is a distinguished node r, called root and zero or more non-empty

### The following themes form the major topics of this chapter: The terms and concepts related to trees (Section 5.2).

CHAPTER 5 The Tree Data Model There are many situations in which information has a hierarchical or nested structure like that found in family trees or organization charts. The abstraction that models hierarchical

### School of Informatics, University of Edinburgh Computer Science 1 Ah [03 04]

School of Informatics, University of Edinburgh Computer Science 1 Ah [03 04] CS1Ah Lecture Note 25 Binary Search Trees This lecture studies binary trees in more detail, in particular their application

### THE SECURITY AND PRIVACY ISSUES OF RFID SYSTEM

THE SECURITY AND PRIVACY ISSUES OF RFID SYSTEM Iuon Chang Lin Department of Management Information Systems, National Chung Hsing University, Taiwan, Department of Photonics and Communication Engineering,

### Questions 1 through 25 are worth 2 points each. Choose one best answer for each.

Questions 1 through 25 are worth 2 points each. Choose one best answer for each. 1. For the singly linked list implementation of the queue, where are the enqueues and dequeues performed? c a. Enqueue in

### PES Institute of Technology-BSC QUESTION BANK

PES Institute of Technology-BSC Faculty: Mrs. R.Bharathi CS35: Data Structures Using C QUESTION BANK UNIT I -BASIC CONCEPTS 1. What is an ADT? Briefly explain the categories that classify the functions

### IMPROVED GOLOMB CODE FOR JOINT GEOMETRIC DISTRIBUTION DATA AND ITS APPLICATIONS IN IMAGE PROCESSING

IMPROVED GOLOMB CODE FOR JOINT GEOMETRIC DISTRIBUTION DATA AND ITS APPLICATIONS IN IMAGE PROCESSING Jian-Jiun Ding ( 丁建均 ), Soo-Chang Pei ( 貝蘇章 ), Wei-Yi Wei ( 魏維毅 ) Tzu-Heng Henry Lee ( 李自恆 ) Department

### Data Structures and Algorithms

Data Structures and Algorithms CS245-2016S-06 Binary Search Trees David Galles Department of Computer Science University of San Francisco 06-0: Ordered List ADT Operations: Insert an element in the list

### CHANNEL. 1 Fast encoding of information. 2 Easy transmission of encoded messages. 3 Fast decoding of received messages.

CHAPTER : Basics of coding theory ABSTRACT Part I Basics of coding theory Coding theory - theory of error correcting codes - is one of the most interesting and applied part of mathematics and informatics.

### Chapter 8: Bags and Sets

Chapter 8: Bags and Sets In the stack and the queue abstractions, the order that elements are placed into the container is important, because the order elements are removed is related to the order in which

### GROUPS SUBGROUPS. Definition 1: An operation on a set G is a function : G G G.

Definition 1: GROUPS An operation on a set G is a function : G G G. Definition 2: A group is a set G which is equipped with an operation and a special element e G, called the identity, such that (i) the

### B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers

B+ Tree and Hashing B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers B+ Tree Properties Balanced Tree Same height for paths

### Chapter 3: Digital Audio Processing and Data Compression

Chapter 3: Digital Audio Processing and Review of number system 2 s complement sign and magnitude binary The MSB of a data word is reserved as a sign bit, 0 is positive, 1 is negative. The rest of the

### Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

Formal Languages and Automata Theory - Regular Expressions and Finite Automata - Samarjit Chakraborty Computer Engineering and Networks Laboratory Swiss Federal Institute of Technology (ETH) Zürich March

### Shannon and Huffman-Type Coders

Shannon and Huffman-Type Coders A useful class of coders that satisfy the Kraft's inequality in an efficient manner are called Huffman-type coders. To understand the philosophy of obtaining these codes,

### Binary Search Trees (BST)

Binary Search Trees (BST) 1. Hierarchical data structure with a single reference to node 2. Each node has at most two child nodes (a left and a right child) 3. Nodes are organized by the Binary Search

### ITEC2620 Introduction to Data Structures

ITEC2620 Introduction to Data Structures Lecture 8b Search Trees Traversals Typical covered as graph traversals in data structures courses I ve never seen a graph traversal in practice! AI background Tree

### CS 341 Homework 9 Languages That Are and Are Not Regular

CS 341 Homework 9 Languages That Are and Are Not Regular 1. Show that the following are not regular. (a) L = {ww R : w {a, b}*} (b) L = {ww : w {a, b}*} (c) L = {ww' : w {a, b}*}, where w' stands for w

### CHAPTER 2 Data Representation in Computer Systems

CHAPTER 2 Data Representation in Computer Systems 2.1 Introduction 47 2.2 Positional Numbering Systems 48 2.3 Converting Between Bases 48 2.3.1 Converting Unsigned Whole Numbers 49 2.3.2 Converting Fractions

### Problem. Indexing with B-trees. Indexing. Primary Key Indexing. B-trees. B-trees: Example. primary key indexing

Problem Given a large collection of records, Indexing with B-trees find similar/interesting things, i.e., allow fast, approximate queries Anastassia Ailamaki http://www.cs.cmu.edu/~natassa 2 Indexing Primary

### CHAPTER 3 Numbers and Numeral Systems

CHAPTER 3 Numbers and Numeral Systems Numbers play an important role in almost all areas of mathematics, not least in calculus. Virtually all calculus books contain a thorough description of the natural,

### TREE BASIC TERMINOLOGIES

TREE Trees are very flexible, versatile and powerful non-liner data structure that can be used to represent data items possessing hierarchical relationship between the grand father and his children and

### CS 3410 Ch 19 Binary Search Trees

CS 3410 Ch 19 Binary Search Trees Sections 19.1, 19.3-19.6 Pages 687-697, 702-737 Exercises 19 The find operation for a linked list is ( ), which can be prohibitive with a large amount of data. In this

### Techniques Covered in this Class. Inverted List Compression Techniques. Recap: Search Engine Query Processing. Recap: Search Engine Query Processing

Recap: Search Engine Query Processing Basically, to process a query we need to traverse the inverted lists of the query terms Lists are very long and are stored on disks Challenge: traverse lists as quickly

### Automata and Computability. Solutions to Exercises

Automata and Computability Solutions to Exercises Fall 25 Alexis Maciel Department of Computer Science Clarkson University Copyright c 25 Alexis Maciel ii Contents Preface vii Introduction 2 Finite Automata

### AS-2261 M.Sc.(First Semester) Examination-2013 Paper -fourth Subject-Data structure with algorithm

AS-2261 M.Sc.(First Semester) Examination-2013 Paper -fourth Subject-Data structure with algorithm Time: Three Hours] [Maximum Marks: 60 Note Attempts all the questions. All carry equal marks Section A

### Chapter 7. Multiway Trees. Data Structures and Algorithms in Java

Chapter 7 Multiway Trees Data Structures and Algorithms in Java Objectives Discuss the following topics: The Family of B-Trees Tries Case Study: Spell Checker Data Structures and Algorithms in Java 2 Multiway

### University of Waterloo Midterm Examination

University of Waterloo Midterm Examination Spring, 2006 Student Name: Student ID Number: Section: Course Abbreviation and Number Course Title CS350 Operating Systems Sections 01 (14:30), 02 (11:30) Instructor

### Iteration, Induction, and Recursion

CHAPTER 2 Iteration, Induction, and Recursion The power of computers comes from their ability to execute the same task, or different versions of the same task, repeatedly. In computing, the theme of iteration

### 1. Give the 16 bit signed (twos complement) representation of the following decimal numbers, and convert to hexadecimal:

Exercises 1 - number representations Questions 1. Give the 16 bit signed (twos complement) representation of the following decimal numbers, and convert to hexadecimal: (a) 3012 (b) - 435 2. For each of

### Number Representation

Number Representation Number System :: The Basics We are accustomed to using the so-called decimal number system Ten digits ::,,,3,4,5,6,7,8,9 Every digit position has a weight which is a power of Base

### Review of Hashing: Integer Keys

CSE 326 Lecture 13: Much ado about Hashing Today s munchies to munch on: Review of Hashing Collision Resolution by: Separate Chaining Open Addressing \$ Linear/Quadratic Probing \$ Double Hashing Rehashing

### NUMBER SYSTEMS CHAPTER 19-1

NUMBER SYSTEMS 19.1 The Decimal System 19. The Binary System 19.3 Converting between Binary and Decimal Integers Fractions 19.4 Hexadecimal Notation 19.5 Key Terms and Problems CHAPTER 19-1 19- CHAPTER

### HUFFMAN CODING AND HUFFMAN TREE

oding: HUFFMN OING N HUFFMN TR Reducing strings over arbitrary alphabet Σ o to strings over a fixed alphabet Σ c to standardize machine operations ( Σ c < Σ o ). inary representation of both operands and

### Cyclic Codes Introduction Binary cyclic codes form a subclass of linear block codes. Easier to encode and decode

Cyclic Codes Introduction Binary cyclic codes form a subclass of linear block codes. Easier to encode and decode Definition A n, k linear block code C is called a cyclic code if. The sum of any two codewords

### Greatest Common Factor and Least Common Multiple

Greatest Common Factor and Least Common Multiple Intro In order to understand the concepts of Greatest Common Factor (GCF) and Least Common Multiple (LCM), we need to define two key terms: Multiple: Multiples

### Chapter 4: Computer Codes

Slide 1/30 Learning Objectives In this chapter you will learn about: Computer data Computer codes: representation of data in binary Most commonly used computer codes Collating sequence 36 Slide 2/30 Data

### CHAPTER 5 A MEMORY EFFICIENT SPEECH CODING SCHEME WITH FAST DECODING

86 CHAPTE A MEMOY EFFICIENT SPEECH CODING SCHEME WITH FAST DECODING. INTODUCTION Huffman coding proposed by Huffman (9) is the most widely used minimal prefix coding algorithm because of its robustness,

### For decimal numbers we have 10 digits available (0, 1, 2, 3, 9) For binary numbers we have 2 digits available, 0 and 1.

Math 167 Ch 17 Review 1 (c) Janice Epstein, 2014 CHAPTER 17 INFORMATION SCIENCE Binary and decimal numbers a short review: For decimal numbers we have 10 digits available (0, 1, 2, 3, 9) 4731 = For binary

### Data Structures. Jaehyun Park. CS 97SI Stanford University. June 29, 2015

Data Structures Jaehyun Park CS 97SI Stanford University June 29, 2015 Typical Quarter at Stanford void quarter() { while(true) { // no break :( task x = GetNextTask(tasks); process(x); // new tasks may

### 1.3 Data Representation

8628-28 r4 vs.fm Page 9 Thursday, January 2, 2 2:4 PM.3 Data Representation 9 appears at Level 3, uses short mnemonics such as ADD, SUB, and MOV, which are easily translated to the ISA level. Assembly

### CSC 321: Data Structures. Fall 2012

CSC : Data Structures Fall Hash tables HashSet & HashMap hash table, hash function collisions linear probing, lazy deletion, primary clustering quadratic probing, rehashing chaining HashSet & HashMap recall:

### 1) A very simple example of RSA encryption

Solved Examples 1) A very simple example of RSA encryption This is an extremely simple example using numbers you can work out on a pocket calculator (those of you over the age of 35 45 can probably even

### Automata and Languages

Automata and Languages Computational Models: An idealized mathematical model of a computer. A computational model may be accurate in some ways but not in others. We shall be defining increasingly powerful

### Nearly Complete Binary Trees and Heaps

Nearly Complete Binary Trees and Heaps DEFINITIONS: i) The depth of a node p in a binary tree is the length (number of edges) of the path from the root to p. ii) The height (or depth) of a binary tree

### Lempel-Ziv Coding Adaptive Dictionary Compression Algorithm

Lempel-Ziv Coding Adaptive Dictionary Compression Algorithm 1. LZ77:Sliding Window Lempel-Ziv Algorithm [gzip, pkzip] Encode a string by finding the longest match anywhere within a window of past symbols

### Binary Codes. Objectives. Binary Codes for Decimal Digits

Binary Codes Objectives In this lesson, you will study: 1. Several binary codes including Binary Coded Decimal (BCD), Error detection codes, Character codes 2. Coding versus binary conversion. Binary Codes

### Definition. E.g. : Attempting to represent a transport link data with a tree structure:

The ADT Graph Recall the ADT binary tree: a tree structure used mainly to represent 1 to 2 relations, i.e. each item has at most two immediate successors. Limitations of tree structures: an item in a tree

### A First Book of C++ Chapter 2 Data Types, Declarations, and Displays

A First Book of C++ Chapter 2 Data Types, Declarations, and Displays Objectives In this chapter, you will learn about: Data Types Arithmetic Operators Variables and Declarations Common Programming Errors

### Trees. Tree Definitions: Let T = (V, E, r) be a tree. The size of a tree denotes the number of nodes of the tree.

Trees After lists (including stacks and queues) and hash tables, trees represent one of the most widely used data structures. On one hand trees can be used to represent objects that are recursive in nature

### 3. Mathematical Induction

3. MATHEMATICAL INDUCTION 83 3. Mathematical Induction 3.1. First Principle of Mathematical Induction. Let P (n) be a predicate with domain of discourse (over) the natural numbers N = {0, 1,,...}. If (1)

### CS271: Data Structures. Binary Search Trees

CS271: Data Structures Binary Search Trees This project has three parts: an implementation of a binary search tree class, a testing plan, and an experiment in tree height. 1. PART 1 Implement a binary

### Merge Sort. 2004 Goodrich, Tamassia. Merge Sort 1

Merge Sort 7 2 9 4 2 4 7 9 7 2 2 7 9 4 4 9 7 7 2 2 9 9 4 4 Merge Sort 1 Divide-and-Conquer Divide-and conquer is a general algorithm design paradigm: Divide: divide the input data S in two disjoint subsets

### CMSC 451: Graph Properties, DFS, BFS, etc.

CMSC 451: Graph Properties, DFS, BFS, etc. Slides By: Carl Kingsford Department of Computer Science University of Maryland, College Park Based on Chapter 3 of Algorithm Design by Kleinberg & Tardos. Graphs

### Introduction to Data Structures

Introduction to Data Structures Albert Gural October 28, 2011 1 Introduction When trying to convert from an algorithm to the actual code, one important aspect to consider is how to store and manipulate