International Journal of Advanced Research in Computer Science and Software Engineering



Similar documents
THE SECURITY AND PRIVACY ISSUES OF RFID SYSTEM

Class Notes CS Creating and Using a Huffman Code. Ref: Weiss, page 433

Full and Complete Binary Trees

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

Symbol Tables. Introduction

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information, Entropy, and Coding

Data Structures Fibonacci Heaps, Amortized Analysis

Binary Trees and Huffman Encoding Binary Search Trees

Arithmetic Coding: Introduction

Image Compression through DCT and Huffman Coding Technique

International Journal of Software and Web Sciences (IJSWS)

Data Structures. Chapter 8

Topology-based network security

GRAPH THEORY LECTURE 4: TREES

A Non-Linear Schema Theorem for Genetic Algorithms

Gambling and Data Compression

Euclidean Minimum Spanning Trees Based on Well Separated Pair Decompositions Chaojun Li. Advised by: Dave Mount. May 22, 2014

Converting a Number from Decimal to Binary

Lecture 5 - CPA security, Pseudorandom functions

encoding compression encryption

Algorithms and Data Structures

Fast Sequential Summation Algorithms Using Augmented Data Structures

The following themes form the major topics of this chapter: The terms and concepts related to trees (Section 5.2).

2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8]

Formal Languages and Automata Theory - Regular Expressions and Finite Automata -

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Analysis of Algorithms I: Optimal Binary Search Trees

Reading 13 : Finite State Automata and Regular Expressions

Lecture 1: Course overview, circuits, and formulas

APP INVENTOR. Test Review

6.3 Conditional Probability and Independence

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

Analysis of Compression Algorithms for Program Data

Chapter 8: Bags and Sets

Binary Coded Web Access Pattern Tree in Education Domain

The Goldberg Rao Algorithm for the Maximum Flow Problem

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

CMPSCI611: Approximating MAX-CUT Lecture 20

Physical Data Organization

Approximation Algorithms

Cpt S 223. School of EECS, WSU

Linear Programming. March 14, 2014

Solutions to Homework 6

SMALL INDEX LARGE INDEX (SILT)

=

Binary Search Trees CMPSC 122

ER E P M A S S I CONSTRUCTING A BINARY TREE EFFICIENTLYFROM ITS TRAVERSALS DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF TAMPERE REPORT A

Testing LTL Formula Translation into Büchi Automata

Bicolored Shortest Paths in Graphs with Applications to Network Overlay Design

- Easy to insert & delete in O(1) time - Don t need to estimate total memory needed. - Hard to search in less than O(n) time

6 Creating the Animation

Scalable Prefix Matching for Internet Packet Forwarding

Lecture 18: Applications of Dynamic Programming Steven Skiena. Department of Computer Science State University of New York Stony Brook, NY

Introduction to Algorithms March 10, 2004 Massachusetts Institute of Technology Professors Erik Demaine and Shafi Goldwasser Quiz 1.

Binary Heaps * * * * * * * / / \ / \ / \ / \ / \ * * * * * * * * * * * / / \ / \ / / \ / \ * * * * * * * * * *

Mathematical Induction

Efficient Data Structures for Decision Diagrams

Regular Expressions and Automata using Haskell

We can express this in decimal notation (in contrast to the underline notation we have been using) as follows: b + 90c = c + 10b

File Management. Chapter 12

Base Conversion written by Cathy Saxton

Notes on Complexity Theory Last updated: August, Lecture 1

Data Structures and Algorithms Written Examination

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

CHAPTER 6. Shannon entropy

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

Number Theory. Proof. Suppose otherwise. Then there would be a finite number n of primes, which we may

A binary search tree or BST is a binary tree that is either empty or in which the data element of each node has a key, and:

5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.

On the Use of Compression Algorithms for Network Traffic Classification

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

A Catalogue of the Steiner Triple Systems of Order 19

Optimal Binary Search Trees Meet Object Oriented Programming

Rotation Operation for Binary Search Trees Idea:

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

The application of prime numbers to RSA encryption

JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS. Received December May 12, 2003; revised February 5, 2004

Binary Heaps. CSE 373 Data Structures

Storage Optimization in Cloud Environment using Compression Algorithm

Cpt S 223. School of EECS, WSU

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

CS104: Data Structures and Object-Oriented Design (Fall 2013) October 24, 2013: Priority Queues Scribes: CS 104 Teaching Team

Mining Social Network Graphs

Section IV.1: Recursive Algorithms and Recursion Trees

Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm

Network File Storage with Graceful Performance Degradation

Linear Programming. April 12, 2005

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10

Problem Set 7 Solutions

Wan Accelerators: Optimizing Network Traffic with Compression. Bartosz Agas, Marvin Germar & Christopher Tran

A binary search tree is a binary tree with a special property called the BST-property, which is given as follows:

Offline sorting buffers on Line

Heaps & Priority Queues in the C++ STL 2-3 Trees

1. Define: (a) Variable, (b) Constant, (c) Type, (d) Enumerated Type, (e) Identifier.

B+ Tree Properties B+ Tree Searching B+ Tree Insertion B+ Tree Deletion Static Hashing Extendable Hashing Questions in pass papers

Chapter 4: Computer Codes

Transcription:

Volume 3, Issue 7, July 23 ISSN: 2277 28X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Greedy Algorithm: Huffman Algorithm Annu Malik, Neeraj Goyat, Prof. Vinod Saroha(Guide) Computer Science and Engineering (Network Security) B.P.S M.V.,Khanpur kalan Haryana, India Abstract This paper presents a survey on Greedy Algorithm. This discussion is centered on overview of huffman code, huffman algorithm and applications of greedy algorithm. A greedy algorithm is an algorithm that follows the problem solving heuristic of making the locally optimal choice at each stage with the hope of finding a global optimum. In many problems, a greedy strategy does not in general produce an optimal solution, but nonetheless a greedy heuristic may yield locally optimal solutions that approximate a global optimal solution in a reasonable time.greedy algorithms determine the minimum number of coins to give while making change. These are the steps a human would take to emulate a greedy algorithm to represent 36 cents using only coins with values {, 5,, 2}. The coin of the highest value, less than the remaining change owed, is the local optimum. (Note that in general the change-making problem requires dynamic programming or integer programming to find an optimal solution; However, most currency systems, including the Euro and US Dollar, are special cases where the greedy strategy does find an optimum solution.) Keywords Greedy, huffman, activity, optimal, algorithm etc I. INTRODUCTION Greedy Algorithm solves problem by making the choice that seems best at the particular moment. Many Optimization problems can be solved using a greedy algorithm. Some problems have no efficient solution, but a greedy algorithm may provide an efficient solution that is close to optimal. A greedy algorithm works if a problem exhibit the following two properties: ) Greedy Choice Property: A globally optimal solution can be arrived at by making a locally optimal solution. In other words, an optimal solution can be obtained by making greedy choices. 2) Optimal Substructure: Optimal solutions contains optimal sub solutions. In other words, solutions to sub problems of an optimal solution are optimal. II. HUFFMAN CODES Data can be encoded efficiently using Huffman codes. It is widely used and very effective technique for compressing data; savings of 2% to 9% are typical, depending on the characteristics of the file being compressed. Huffman's greedy algorithm uses a table of the frequencies of occurrence of each character to build up an optimal way of representing each character as a binary string. Suppose we have 5 characters in a data file. Normal storage: 8 bits per character (ASCII)-8* 5 bits in a file. But, we want to compress the file and store it compactly. Suppose only 6 characters appear in the file: a b c d e f Total Frequency 45 3 2 6 9 5 How can we represent the data in a compact way? [] Fixed Length Code: Each letter represented by an equal number of bits. With a fixed length code, at least 3 bits per character: For example: a b c d e f For a file with 5 characters, we need 3* 5 bits. 23, IJARCSSE All Rights Reserved Page 296

July - 23, pp. 296-33 [2] A Variable-length code: It can do considerably better than fixed -length code, by giving frequent characters short code words and infrequent characters long code words. For example, a b c d e f Number of bits = (45*+3*3+2*3+6*3+9*4+5*4)* =2.24* 5 bits. Thus, 224 bits to represent the file, a saving of approximately 25%. In fact, this is an optimal character code for this file. Let us denote the characters by C, C 2, C 3,.., C n and devote their frequencies by f, f 2,, f n. Suppose there is an encoding E in which a bit string S i of length s i represents C i, the length of the file compressed by using encoding E is L(E, F) = s i * f i for i= to n III. PREFIX CODES The prefixes of an encoding of one character must not be equal to a complete encoding of another character e.g. and are not valid codes because is a prefix of. This constraint is called the prefix constraint. Codes in which no codewords is also a prefix f some other code word are called prefix codes. Shortening the encoding of one character may lengthen the encoding of others. To find an encoding E that satisfies the prefix constraint and minimizes L(E, F). Prefix codes are desirable because they simplify encoding(compression) and decoding. Encoding is always simple for any binary character code, we just concatenate the code words representing each character of the file. Decoding is also quite simple with a prefix code. Since no codeword is a prefix of any other, the codeword that begins an encoded file is unambiguous. We can simply identify the initial codeword, and repeat the decoding process on the remainder of the encoded file. The decoding process needs a convenient representation for the prefix code so that the initial codeword can be easily picked off. A binary tree whose leaves are the given characters provides one such representation. We interpret the binary codeword for a character as the path from the root to that character, where means go to the left child and means go to the right child. Note that these are not binary search trees, since the leaves need not appear in sorted order and internal nodes do not contain character keys. An optimal code for a file is always represented by a full binary tree, in which every non-leaf node has two children. The fixed-length code in our example is not optimal since its tree, because it is not a full binary tree: there are code words beginning..., but none beginning... Since we can now restrict our attention to full binary trees, we can say that if C is the alphabet from which the characters are drawn, then the tree for an optimal prefix code has exactly C leaves, one for each letter of the alphabet, and exactly C - internal nodes. Fig. Trees corresponding to the coding schemes. Each leaf is labeled with a character and its frequency of occurrence. Each internal node is labeled with the sum of the weights of the leaves in its subtree. 86 4 58 28 4 b:3 c:2 d:6 e:9 f:5 (a). Not Optimal (a). The tree corresponding to the fixed-length code a=,...,f= 23, IJARCSSE All Rights Reserved Page 297

July - 23, pp. 296-33 55 25 3 c:2 b:3 4 d:6 f:5 e:9 (b) Optimal (b). the tree corresponding to the optimal prefix code a=,b=,..., f=. GREEDY ALGORITHM FOR CONSTRUCTING A HUFFMAN CODE Huffman invented a greedy algorithm that constructs an optimal prefix code called a Huffman Code. A+B A B A B The algorithm builds the tree T corresponding to the optimal code in a bottom-up manner. It begins with a set of C leaves and performs a sequence of C - merging operations to create the final tree. In the pseudo code HUFFMAN(C), we assume that C is a set of n characters and that each character c C is an object with a defined frequency f[c]. A priority queue Q, keyed 23, IJARCSSE All Rights Reserved Page 298

July - 23, pp. 296-33 on f, is used to identify the two least-frequent objects to merge together. The result of the merger of two objects is a new object whose frequency is the sum of the frequencies of the two objects that were merged. HUFFMAN(C) 3) n C 4) Q C 5) for I to n- 6) do z ALLOCATE-NODE() 7) x left[z] EXTRACT-MIN(Q) 8) y right[z] EXTRACT-MIN(Q) 9) f[z] f[x] + f[y] ) INSERT(Q, z) ) return EXTRACT-MIN(Q) The analysis of the remaining time of Huffman's algorithm assumes that Q is implemented as a binary heap. For a set of n characters, the initialization of Q in line Q in the 2 can be performed in O(n) time using the BUILD-HEAP operation requires time O(nlgn), the loop contributes O(nlgn) to the running time. Thus, the total running time of HUFFMAN on a set of n characters is O(nlgn). The algorithm is based on a reduction of a problem with n characters to a problem with n- characters. A new character replaces two existing ones. f:5 e:9 c:2 b:3 d:6 c:2 b:3 4 d:6 f:5 e:9 4 25 d:6 f:5 e:9 c:2 b:3 25 3 c:2 b:3 4 d:6 f:5 e:9 23, IJARCSSE All Rights Reserved Page 299

July - 23, pp. 296-33 55 25 3 c:2 b:3 4 d:6 f:5 e:9 a b c d e f Example: Find an optimal huffman code for the following set of frequencies: a : 5 b : 25 c : 5 d : 4 e : 75 Solution: Given that :C = {a, b, c, d, e} f ( C ) = {5, 25, 5, 4, 75} n = 5 Q c i.e., c 5 b 25 d 4 a 5 e 75 For i to 4 i = Z Allocate node x Extract-Min(Q) y Extract-Min(Q) c 5 b 25 d 4 a 5 e 75 x y 23, IJARCSSE All Rights Reserved Page 3

July - 23, pp. 296-33 Left[z] x Right[z] y F[z] f(x) + f[y] =5+25 F[z] = 4 i.e., z 4 d 4 a 5 e 75 c 5 b 25 x Left[z] y Right[z] Again, for i = 2 x 4 y d 4 a 5 e 75 c 5 b 25 z Allocate node x 4 y 4 left[z] x Right[z] y F[z] = 4 + 4 = 8 8 a 5 e 75 Left[z] x 4 d 4 y Right[z] c 5 b 25 23, IJARCSSE All Rights Reserved Page 3

July - 23, pp. 296-33 Similarly we apply the same process, we get 8 25 4 d 4 c 5 b 25 a 5 e 75 25 8 25 4 d 4 a 5 e 75 c 5 b 25 Huffman Tree Using Huffman Codes Each message has a different tree. The tree must be saved with the message. Huffman codes are effective for long files where the savings in the message can offset the cost for storing the tree. Decode files by starting at root and proceeding down the tree according to the bits in the message ( = left, = right). When a leaf is encountered, output the character at that leaf and restart at the root. Huffman codes are also effective when the tree can be pre-computed and used for a large number of messages (e.g., a tree based on the frequency of occurrence of characters in the English Language). Huffman codes are not very good for random files(each character about the same frequency). 23, IJARCSSE All Rights Reserved Page 32

July - 23, pp. 296-33 IV. APPLICATIONS Greedy algorithms mostly (but not always) fail to find the globally optimal solution, because they usually do not operate exhaustively on all the data. They can make commitments to certain choices too early which prevent them from finding the best overall solution later. For example, all known greedy coloring algorithms for the graph coloring problem and all other NP-complete problems do not consistently find optimum solutions. Nevertheless, they are useful because they are quick to think up and often give good approximations to the optimum. If a greedy algorithm can be proven to yield the global optimum for a given problem class, it typically becomes the method of choice because it is faster than other optimization methods like dynamic programming. Examples of such greedy algorithms are Kruskal's algorithm and Prim's algorithm for finding minimum spanning trees, Dijkstra's algorithm for finding single-source shortest paths, and the algorithm for finding optimum Huffman trees. V. CONCLUSION Greedy algorithms are usually easy to think of, easy to implement and run fast. Proving their correctness may require rigorous mathematical proofs and is sometimes insidious hard. In addition, greedy algorithms are infamous for being tricky. Missing even a very small detail can be fatal. But when you have nothing else at your disposal, they may be the only salvation. With backtracking or dynamic programming you are on a relatively safe ground. With greedy instead, it is more like walking on a mined field. Everything looks fine on the surface, but the hidden part may backfire on you when you least expect. While there are some standardized problems, most of the problems solvable by this method call for heuristics. There is no general template on how to apply the greedy method to a given problem, however the problem specification might give you a good insight. In some cases there are a lot of greedy assumptions one can make, but only few of them are correct. They can provide excellent challenge opportunities... REFERENCES Wikipedia Google Algorithms Design and Analysis by Udit Agarwal 23, IJARCSSE All Rights Reserved Page 33