Compression techniques



Similar documents
Image Compression through DCT and Huffman Coding Technique

Introduction to image coding

Comparison of different image compression formats. ECE 533 Project Report Paula Aguilera

Information, Entropy, and Coding

Storage Optimization in Cloud Environment using Compression Algorithm

Reading.. IMAGE COMPRESSION- I IMAGE COMPRESSION. Image compression. Data Redundancy. Lossy vs Lossless Compression. Chapter 8.

CHAPTER 2 LITERATURE REVIEW

On the Use of Compression Algorithms for Network Traffic Classification

Analysis of Compression Algorithms for Program Data

LZ77. Example 2.10: Let T = badadadabaab and assume d max and l max are large. phrase b a d adadab aa b


Arithmetic Coding: Introduction

encoding compression encryption

ANALYSIS OF THE EFFECTIVENESS IN IMAGE COMPRESSION FOR CLOUD STORAGE FOR VARIOUS IMAGE FORMATS

Lempel-Ziv Coding Adaptive Dictionary Compression Algorithm

Modified Golomb-Rice Codes for Lossless Compression of Medical Images

Structures for Data Compression Responsible persons: Claudia Dolci, Dante Salvini, Michael Schrattner, Robert Weibel

How to Send Video Images Through Internet

Fast Arithmetic Coding (FastAC) Implementations

Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework

Lossless Medical Image Compression using Predictive Coding and Integer Wavelet Transform based on Minimum Entropy Criteria

Study and Implementation of Video Compression Standards (H.264/AVC and Dirac)

Today s topics. Digital Computers. More on binary. Binary Digits (Bits)

Design of Efficient Algorithms for Image Compression with Application to Medical Images

Standard encoding protocols for image and video coding

Figure 1: Relation between codec, data containers and compression algorithms.

Video compression: Performance of available codec software

Hybrid Compression of Medical Images Based on Huffman and LPC For Telemedicine Application

Wan Accelerators: Optimizing Network Traffic with Compression. Bartosz Agas, Marvin Germar & Christopher Tran

Data analysis and visualization topics

Secured Lossless Medical Image Compression Based On Adaptive Binary Optimization

Sachin Dhawan Deptt. of ECE, UIET, Kurukshetra University, Kurukshetra, Haryana, India

ELEC3028 Digital Transmission Overview & Information Theory. Example 1

Entropy and Mutual Information

A NEW LOSSLESS METHOD OF IMAGE COMPRESSION AND DECOMPRESSION USING HUFFMAN CODING TECHNIQUES

Data Mining Un-Compressed Images from cloud with Clustering Compression technique using Lempel-Ziv-Welch

Data Storage. Chapter 3. Objectives. 3-1 Data Types. Data Inside the Computer. After studying this chapter, students should be able to:

Data Reduction: Deduplication and Compression. Danny Harnik IBM Haifa Research Labs

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Video Encryption Exploiting Non-Standard 3D Data Arrangements. Stefan A. Kramatsch, Herbert Stögner, and Andreas Uhl

Streaming Lossless Data Compression Algorithm (SLDC)

Video Coding Basics. Yao Wang Polytechnic University, Brooklyn, NY11201

FCE: A Fast Content Expression for Server-based Computing

How To Code With Cbcc (Cbcc) In Video Coding

JPEG A Short Tutorial

Lossless Grey-scale Image Compression using Source Symbols Reduction and Huffman Coding

Data analysis and visualization topics

The enhancement of the operating speed of the algorithm of adaptive compression of binary bitmap images

Data Storage 3.1. Foundations of Computer Science Cengage Learning

ANALYSIS AND EFFICIENCY OF ERROR FREE COMPRESSION ALGORITHM FOR MEDICAL IMAGE

CM0340 SOLNS. Do not turn this page over until instructed to do so by the Senior Invigilator.

MMGD0203 Multimedia Design MMGD0203 MULTIMEDIA DESIGN. Chapter 3 Graphics and Animations

Study and Implementation of Video Compression standards (H.264/AVC, Dirac)

Computers Are Your Future Eleventh Edition Chapter 5: Application Software: Tools for Productivity

Transform-domain Wyner-Ziv Codec for Video

Parametric Comparison of H.264 with Existing Video Standards

PC Free Operation Guide

Statistical Modeling of Huffman Tables Coding

Design and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen

ANALYSIS OF THE COMPRESSION RATIO AND QUALITY IN MEDICAL IMAGES

Video-Conferencing System

Quality Estimation for Scalable Video Codec. Presented by Ann Ukhanova (DTU Fotonik, Denmark) Kashaf Mazhar (KTH, Sweden)

Binary Differencing for Media Files

H 261. Video Compression 1: H 261 Multimedia Systems (Module 4 Lesson 2) H 261 Coding Basics. Sources: Summary:

Ex. 2.1 (Davide Basilio Bartolini)

We are presenting a wavelet based video conferencing system. Openphone. Dirac Wavelet based video codec

How To Trace

H.264/MPEG-4 AVC Video Compression Tutorial

For Articulation Purpose Only

Michael W. Marcellin and Ala Bilgin

FUNDAMENTALS of INFORMATION THEORY and CODING DESIGN

Chapter 3 Data Warehouse - technological growth

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction

HP Smart Document Scan Software compression schemes and file sizes

Coding and decoding with convolutional codes. The Viterbi Algor

Reviewer s Guide. Morpheus Photo Animation Suite. Screenshots. Tutorial. Included in the Reviewer s Guide:

The following themes form the major topics of this chapter: The terms and concepts related to trees (Section 5.2).

White paper. An explanation of video compression techniques.

Analysis of Algorithms I: Optimal Binary Search Trees

The Data Compression Book 2nd edition. By Mark Nelson and Jean-loup Gailly

Development and Evaluation of Point Cloud Compression for the Point Cloud Library

THE SECURITY AND PRIVACY ISSUES OF RFID SYSTEM

Performance Analysis and Comparison of JM 15.1 and Intel IPP H.264 Encoder and Decoder

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Why? A central concept in Computer Science. Algorithms are ubiquitous.

ENGINEERING COMMITTEE Digital Video Subcommittee SCTE METHODS FOR CARRIAGE OF CEA-608 CLOSED CAPTIONS AND NON-REAL TIME SAMPLED VIDEO

SMART TV AD FORMATS SPECIFICATIONS VERSION 1.9 SMART TV AD FORMATS SPECIFICATIONS. Version 1.9 Last update 11/2014 Page 1 / 19

Introduction to Medical Image Compression Using Wavelet Transform

Gambling and Data Compression

Comparative Study Between Various Algorithms of Data Compression Techniques

Transcription:

Compression techniques David Bařina February 22, 2013 David Bařina Compression techniques February 22, 2013 1 / 37

Contents 1 Terminology 2 Simple techniques 3 Entropy coding 4 Dictionary methods 5 Conclusion David Bařina Compression techniques February 22, 2013 2 / 37

Introduction multimedia data: text, documents, audio, image, animation, video, medical data, geographic data,... need for compression: single frame of HD video 1920 1080 3 6 MB (at 60 FPS 360 MB/s) formats: BMP, PNG, JPEG, MPEG-4,... David Bařina Compression techniques February 22, 2013 3 / 37

Terminology Compression reduce the amount of data, bitrate lossless vs. lossy reduce redundancy or irrelevance of data Redundancy reduce redundancy = lossless compression original signal can be reconstructed without distortion Irrelevance with regard to human perception elimination of irrelevant data = lossy compression original signal cannot be exactly reconstructed David Bařina Compression techniques February 22, 2013 4 / 37

Terminology Coding mutually unambiguous assignment of symbols of one alphabet to symbols of other alphabet block vs. stream codes e.g. symbol A code 0110 reduce redundancy = compression Computer program compressor/encoder decompressor/decoder codec type of data (e.g. videocodec) David Bařina Compression techniques February 22, 2013 5 / 37

Terminology Data model probabilistic (statistical) model context oriented (Markov) model Compression methods symmetric vs. asymmetric (complexity) block vs. stream (block of symbols), e.g. bzip2 900 kb single-pass, two-pass, multi-pass static = 1 pass, data model declared in advance semi-adaptive = 2 passes, model needs to be transferred adaptive (dynamic) = 1 pass, model modified on the fly David Bařina Compression techniques February 22, 2013 6 / 37

Terminology Variable-length coding short codes for more frequently occurring symbols e.g. A 1, B 01,..., Z 000000001 implemented by entropy coding e.g. Huffman coding Prefix code property that no code word is a prefix of another code word unambiguous decoding without delimiters e.g. 1, 01, 001, 0001,... David Bařina Compression techniques February 22, 2013 7 / 37

Terminology Dictionary data structure containing fragments of uncompressed file e.g. 0 "aa", 1 "ba", 2 "bc" 10021 "baaaaabcba" Compression methods statistical (statistical data model) context (context oriented model) dictionary (dictionary) David Bařina Compression techniques February 22, 2013 8 / 37

Terminology Compression ratio = output size input size ( < 1 compression, > 1 expansion ) Evaluation of compression methods efficiency (compression ratio) time and memory complexity influence of data type on compression ratio quality (lossy methods) David Bařina Compression techniques February 22, 2013 9 / 37

Terminology Compression methods predictive (predictor, error) transform (e.g. DCT) Entropy quantity indicating the amount of "information" information content, necessary number of bits number of yes/no questions that can reveal the content of message unit is "bit" H = p(a) log 2 p(a) a A entropy encoders compress data almost optimally regarding to their entropy David Bařina Compression techniques February 22, 2013 10 / 37

Multidimensional data Linearization raster order (line by line) zig-zag, e.g. DCT in JPEG Morton order (Z-curve), e.g. DWT tree David Bařina Compression techniques February 22, 2013 11 / 37

RLE (Run-Length Encoding) coding sequences of identical characters many modifications part of complex methods code contains the information about the encoded symbol and the number of repetitions e.g. "A A A A B C D" "4 A B C D" "escape" symbol, "escape" sequence used almost everywhere (BMP, PCX, TIFF, TGA, JPEG, bzip2) read symbol identical to previous? yes no write sequence reset counter increment counter David Bařina Compression techniques February 22, 2013 12 / 37

Coding of differences relative coding, delta coding part of complex methods replace input value with difference from the previous value trivial predictive method error of prediction is futher encoded e.g. 17 16 18 32 35 35 34 28 28-1 +2 +14 +3 0-1 -6 0 David Bařina Compression techniques February 22, 2013 13 / 37

Predictive coding the coded symbol is predicted from already encoded symbols error of prediction is futher encoded order of predictor (1st, 2nd,... ) domains (1-D, 2-D, tree) Predictors linear (delta coding, Lossless JPEG, PNG) non-linear (median, MED/LOCO-I, Paeth, GAP) min(a, b) : c max(a, b) ˆx = max(a, b) : c min(a, b) a + b c c b a x David Bařina Compression techniques February 22, 2013 14 / 37

Unary coding very simple entropy coding optimal for p(n) = 2 n maps non-negative integers N to code words, e.g. N N 1, 0 0 0 1 10 2 110 3 1110... 0 and 1 can be exchanged David Bařina Compression techniques February 22, 2013 15 / 37

Golomb-Rice coding special case of Golomb coding; fast implementation used e.g. in JPEG-LS, FLAC, MPEG-4 ALS maps non-negative integers N to code words the code is adjustable with parameter M = 2 C for M = 1 the coding is unary coding to obtain a code of number N, following variables have to be determined Q = N/M R = N Q M C = log 2 M Q is futher encoded with unary code, R with binary code with C bits David Bařina Compression techniques February 22, 2013 16 / 37

Golomb-Rice coding for M = 4 N Q R code 0 0 0 1 00 1 0 1 1 01 2 0 2 1 10 3 0 3 1 11 4 1 0 01 00 5 1 1 01 01 6 1 2 01 10 7 1 3 01 11 8 2 0 001 00 9 2 1 001 01 10 2 2 001 10 11 2 3 001 11 12 3 0 0001 00 David Bařina Compression techniques February 22, 2013 17 / 37

Shannon Fano coding method creates the code-words according to probabilities of coded symbols (adapts on the data) best results are achieved for the power of 2 probabilities in practice, Huffman coding has slightly better compresion ratio symbols are leaves in binary tree with edges (0 and 1) which represent code-words tree construction: 1. sort symbols according their probabilities 2. split this set into two sets in such a way that these sets will have equal or very similar sum of probabilities 3. recursively apply step 2 on both sets (nodes of tree) up to decomposition to single symbols used in ZIP/Implode David Bařina Compression techniques February 22, 2013 18 / 37

Shannon Fano coding example probability code 0.25 1 1 11 0.20 1 0 10 0.15 0 1 1 011 0.15 0 1 0 010 0.10 0 0 1 001 0.10 0 0 0 1 0001 0.05 0 0 0 0 0000 David Bařina Compression techniques February 22, 2013 19 / 37

Huffman coding popular entropy coding method can adapts to data best results for the power of 2 probabilities symbols are the leaves in a binary tree in which the edges indicate the code word used in bzip2, Deflate, JPEG, MP3 tree construction: 1. sort symbols according to their probability 2. take two symbols with lowest probabilities, link them to the new node 3. continue with step 2 until there are unlinked nodes in adaptive variant, the tree has to be corrected on the fly David Bařina Compression techniques February 22, 2013 20 / 37

Huffman coding example 1,0 p code 0.4 0 0 0.2 0 1 10 0.2 1 1 1 111 0.1 1 0 1 1 1101 0.1 0 0 1 1 1100 0,2 0,4 0,6 0,1 0,1 0,2 0,2 0,4 David Bařina Compression techniques February 22, 2013 21 / 37

Arithmetic coding optimal codes for any symbol probability method assign one code-word to the entire data, not only single symbol method starts with the interval which is narrowed according to the probability of coded symbols interval narrowing requires more bits, so the code-word length increases progressively the idea: symbol with higher probability narrow interval less in comparison with symbol with lower probability practical implementation have to work with integers used in: context coders, JPEG, Dirac David Bařina Compression techniques February 22, 2013 22 / 37

Arithmetic coding example David Bařina Compression techniques February 22, 2013 23 / 37

Context-oriented compression most commonly using arithmetic coding or its modification unlike the arithmetic coding, these methods do not encode the probability of single symbol instead, they encode the probability of symbol occurrence in a particular context context: several already encoded symbols, pixels, bits or coefficients order of context the context is used to predict a next symbol (assign probability to any next symbol) escape codes: switch of context (PPMx) used in: MPEG-4 (CABAC, CAVLC), JPEG 2000 (EBCOT), JPEG-LS, PPMx David Bařina Compression techniques February 22, 2013 24 / 37

Context-oriented compression example string aabbbc context of order of 1 this model predict e.g. after b symbol occurence of b symbols with probability of 66 % and c symbol with probability 33 % EBCOT coder used in JPEG 2000 image compresion standard David Bařina Compression techniques February 22, 2013 25 / 37

LZ77 published by A. Lempel and J. Ziv in 1977 many modifications used in Deflate sliding window two parts: dictionary and lookahead buffer in practice, thousands vs. tens of symbols encoder produces tags (offset, length, symbol)...east#easily#teases... "eas" found 2 on positions 8 and 13 (match) encoder produce tag (13, 3, e ) tag elements are encoded with corresponding number of bits log 2 S, log 2 (L 1), log 2 A, where A is a size of used alphabet David Bařina Compression techniques February 22, 2013 26 / 37

LZ77...east#easily#trashe... when fragment is not found in dictionary, encoder produces tag (0, 0, r )...east#easily#tttttt... match may exceed the search buffer, here (1, 5, t ) it is the reason for the number of bits log 2 (L 1) of length LZ77 assumed that fragments occur close together there are many enhancements: variable length tags, sophisticated data structures, omit symbol in tag David Bařina Compression techniques February 22, 2013 27 / 37

LZ78 published by A. Lempel and J. Ziv in 1978 many modifications, e.g. LZW no sliding window, only the dictionary is memory intensive, can be solved in different ways suitable data structure to maintain the dictionary is trie first item in the dictionary is an empty string encoder creates tags (index, symbol) during compression, longest string stored in dictionary is found in input stream now, tag with its index and next symbol is created every tag indicates new string, this string is stored into dictionary David Bařina Compression techniques February 22, 2013 28 / 37

LZ78 Example: ABRAKADAKABRA at the beginning, the dictionary is empty A, B, R are encoded like (0, A ), (0, B ), (0, R ) 1 0 A B 2 R 3 AK and AD are stored unde node with A and encoded like (1, K ), (1, D ) in the same way, AKA, BR and last A are encoded like (4, A ), (2, R ), (1, EOF) 4 6 A K D R 5 7 David Bařina Compression techniques February 22, 2013 29 / 37

LZW variant of LZ78 developed by T. Welch in 1984 used in GIF, TIFF, PDF use dictionary initialized with every symbols of alphabet tag has only one element: (index) encoder: 1. in input stream, looks for longest string I stored in dictionary 2. thus, next symbol x causes that Ix is not found in dictionary 3. dictionary index of I is sent to output, Ix is stored into dictionary, I is now x 4. go to step 1 decoder: 1. reads dictionary index of I and puts string I to output 2. string Ix should be stored into dictionary, however x is not known 3. reads next index of J and puts corresponding string J to output, first symbol of J is x 4. now Ix can be stored into dictionary, J is now I 5. go to step 2. David Bařina Compression techniques February 22, 2013 30 / 37

LZW: example Example: sir#sid# (encoder) input x= s Ix= s found I=Ix= s input x= i Ix= si not found output idx. I= s store Ix= si I=x= i input x= r Ix= ir not found output idx. I= i store Ix= ir I=x= r input x= # Ix= r# not found output idx. I= r store Ix= r# I=x= # input x= s Ix= #s not found output idx. I= # store Ix= #s I=x= s input x= i Ix= si found I=Ix= si input x= d Ix= sid not found output idx. I= si store Ix= sid I=x= d input x= # Ix= d# not found output idx. I= d store Ix= d# I=x= # input x=eof Ix= #EOF not found output idx. I= # end Dictionary: 0 255 characters 0 255 256 si 257 ir 258 r# 259 #s 260 sid 261 d# David Bařina Compression techniques February 22, 2013 31 / 37

LZW: example Example: sir#sid# (decoder) input index J= s output J= s x=j(1)= s store Ix= s I=J= s input index J= i output J= i x=j(1)= i store Ix= si I=J= i input index J= r output J= r x=j(1)= r store Ix= ir I=J= r input index J= # output J= # x=j(1)= # store Ix= r# I=J= # input index J= si output J= si x=j(1)= s store Ix= #s I=J= si input index J= d output J= d x=j(1)= d store Ix= sid I=J= d input index J= # output J= # x=j(1)= # store Ix= d# I=J= # Dictionary: 0 255 characters 0 255 256 si 257 ir 258 r# 259 #s 260 sid 261 d# David Bařina Compression techniques February 22, 2013 32 / 37

Deflate used in ZIP (originaly), zlib/gzip, 7-Zip, PNG, MNG, PDF combination of LZ77 and Huffman coding in contrast with LZ77, tags have only two elements (offset, length) missing item (symbol) is written into output stream separately compressed stream consists of three entities: literals/symbols, offsets/distances and lengths these entities are encoded using Huffman codes using two tables: literals/lengths and offsets lengths are limited up to 258, literals are bytes (0 255); offsets up to buffer size of 32 kb data are compressed in blocks of various sizes David Bařina Compression techniques February 22, 2013 33 / 37

Deflate defines 3 modes of compression: 1. uncompressed (max. 65 535 B) 2. compression with predefined Huffman tables 3. compression with tables stored in compressed stream match selection is delayed (by one symbol)...she#needs#then#there#the#new... do not selects match (11,3) = "the" "t" encodes like literal selects "delayed" match (20,5) = "he#ne" David Bařina Compression techniques February 22, 2013 34 / 37

Suitable combinations of methods data (uncompressed) data RLE (BMP, TGA) data prediction EC (JPEG-LS) data prediction RLE EC data prediction dictionary method (PNG) data dictionary method (GIF) data transform RLE+EC (JPEG) David Bařina Compression techniques February 22, 2013 35 / 37

Summarization terminology (redundancy, data model, types of compression methods, prefix code, dictionary, compression ratio, entropy) multidimensional data processing basic methods: RLE, delta coding, prediction entropy coders (unary, Golomb-Rice, Huffman and arithmetic coders) context-oriented compression dictionary methods (LZ77, LZ78, LZW and Deflate) combinations of these methods David Bařina Compression techniques February 22, 2013 36 / 37

Sources David Salomon. Data Compression: The Complete Reference. 4th ed., Springer, 2006. http://www.stringology.org/datacompression/ specifications of formats David Bařina Compression techniques February 22, 2013 37 / 37