Arithmetic Coding: Introduction

Similar documents
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

LZ77. Example 2.10: Let T = badadadabaab and assume d max and l max are large. phrase b a d adadab aa b

Compression techniques

Chapter 6: Episode discovery process

Lempel-Ziv Coding Adaptive Dictionary Compression Algorithm

Bounded Cost Algorithms for Multivalued Consensus Using Binary Consensus Instances

CHAPTER 6. Shannon entropy

On the Use of Compression Algorithms for Network Traffic Classification

Storage Optimization in Cloud Environment using Compression Algorithm

Streaming Lossless Data Compression Algorithm (SLDC)

Computer Science 281 Binary and Hexadecimal Review

Gambling and Data Compression

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Reading.. IMAGE COMPRESSION- I IMAGE COMPRESSION. Image compression. Data Redundancy. Lossy vs Lossless Compression. Chapter 8.

Static Program Transformations for Efficient Software Model Checking

ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015

International Journal of Advanced Research in Computer Science and Software Engineering

Algebraic Computation Models. Algebraic Computation Models

Useful Number Systems

Image Compression through DCT and Huffman Coding Technique

Information Theory and Coding SYLLABUS

A Catalogue of the Steiner Triple Systems of Order 19

2010/9/19. Binary number system. Binary numbers. Outline. Binary to decimal

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

CSC4510 AUTOMATA 2.1 Finite Automata: Examples and D efinitions Definitions

FUNDAMENTALS of INFORMATION THEORY and CODING DESIGN

Load Balancing. Load Balancing 1 / 24

Numerical Matrix Analysis

Classification - Examples

Base Conversion written by Cathy Saxton

Introduction to image coding

Binary Trees and Huffman Encoding Binary Search Trees

Genetic Algorithms commonly used selection, replacement, and variation operators Fernando Lobo University of Algarve

Binary Number System. 16. Binary Numbers. Base 10 digits: Base 2 digits: 0 1

Quantum Computing Lecture 7. Quantum Factoring. Anuj Dawar

Information, Entropy, and Coding

The enhancement of the operating speed of the algorithm of adaptive compression of binary bitmap images

28 Simply Confirming Onsite

Sheet 7 (Chapter 10)

The Halting Problem is Undecidable

Fast Arithmetic Coding (FastAC) Implementations

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Cloud and Big Data Summer School, Stockholm, Aug Jeffrey D. Ullman

Local periods and binary partial words: An algorithm

HIGH DENSITY DATA STORAGE IN DNA USING AN EFFICIENT MESSAGE ENCODING SCHEME Rahul Vishwakarma 1 and Newsha Amiri 2

Entropy and Mutual Information

Measures of Error: for exact x and approximation x Absolute error e = x x. Relative error r = (x x )/x.

Low Complexity Text and Image Compression for Wireless Devices and Sensors

1. Prove that the empty set is a subset of every set.

9th Max-Planck Advanced Course on the Foundations of Computer Science (ADFOCS) Primal-Dual Algorithms for Online Optimization: Lecture 1

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction

Chapter Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

Approximation Algorithms

Subject knowledge requirements for entry into computer science teacher training. Expert group s recommendations

PROG0101 Fundamentals of Programming PROG0101 FUNDAMENTALS OF PROGRAMMING. Chapter 3 Algorithms

Mathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson

Specimen 2015 am/pm Time allowed: 1hr 30mins

Factoring & Primality

Approximability of Two-Machine No-Wait Flowshop Scheduling with Availability Constraints

URL encoding uses hex code prefixed by %. Quoted Printable encoding uses hex code prefixed by =.

arxiv: v1 [math.pr] 5 Dec 2011

Section 1.4 Place Value Systems of Numeration in Other Bases

CS101 Lecture 11: Number Systems and Binary Numbers. Aaron Stevens 14 February 2011

Notes on Complexity Theory Last updated: August, Lecture 1

Analysis of Compression Algorithms for Program Data

Chapter One Introduction to Programming

INTERNATIONAL TELECOMMUNICATION UNION

Near Optimal Solutions

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc

Appendix C: Keyboard Scan Codes

DNA Sequencing Data Compression. Michael Chung

G. N. N. Martin Presented in March 1979 to the Video & Data Recording Conference, IBM UK Scientific Center held in Southampton July

Lecture 18: Applications of Dynamic Programming Steven Skiena. Department of Computer Science State University of New York Stony Brook, NY

RN-coding of Numbers: New Insights and Some Applications

FCE: A Fast Content Expression for Server-based Computing

LSN 2 Number Systems. ECT 224 Digital Computer Fundamentals. Department of Engineering Technology

Number Representation

Data Storage. Chapter 3. Objectives. 3-1 Data Types. Data Inside the Computer. After studying this chapter, students should be able to:

Application Note RMF Magic 5.1.0: EMC Array Group and EMC SRDF/A Reporting. July 2009

2016 national curriculum tests. Key stage 2. Mathematics test mark schemes. Paper 1: arithmetic Paper 2: reasoning Paper 3: reasoning

College of information technology Department of software

Field Properties Quick Reference

Outline. hardware components programming environments. installing Python executing Python code. decimal and binary notations running Sage

Sources: On the Web: Slides will be available on:

EE 261 Introduction to Logic Circuits. Module #2 Number Systems

Today. Binary addition Representing negative numbers. Andrew H. Fagg: Embedded Real- Time Systems: Binary Arithmetic

Chapter 4: Computer Codes

Chapter 2: Elements of Java

Instruction Set Architecture (ISA)

The BBP Algorithm for Pi

Object Oriented Software Design

RN-Codings: New Insights and Some Applications

Python Programming: An Introduction to Computer Science

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Transcription:

Data Compression Arithmetic coding Arithmetic Coding: Introduction Allows using fractional parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip More time costly than Huffman, but integer implementation is not too bad.

Arithmetic Coding (message intervals) Assign each symbol to an interval range from 0 (inclusive) to 1 (exclusive). e.g. 1.0 i 1 f ( i) p( j) 0.7 0.0 = j= 1 f(a) =.0, f(b) =.2, f(c) =.7 The interval for a particular symbol will be called the symbol interval (e.g for b it is [.2,.7)) Arithmetic Coding: Encoding Example Coding the message sequence: bac 1.0 0.7 0.3 0.7 0.55 7 0.0 0.3 2 The final sequence interval is [.27,.3)

Arithmetic Coding To code a sequence of symbols c with probabilities p[c] use the following: l 0 s 0 = 0 = 1 l i s i = l = i 1 i 1 * p i 1 * [ c ] [ c ] f[c] is the cumulative prob. up to symbol c (not included) Final interval size is n s n c i i= 1 The interval for a message sequence will be called the sequence interval s = + s p i f [ ] i Uniquely defining an interval Important property: The intervals for distinct messages of length n will never overlap Therefore by specifying any number in the final interval uniquely determines the msg. Decoding is similar to encoding, but on each step need to determine what the message value is and then reduce interval

Arithmetic Coding: Decoding Example Decoding the number.49, knowing the message is of length 3: 0.49 1.0 0.7 0.7 0.55 0.49 0.55 0.49 0.475 0.0 0.3 0.35 0.3 The message is bbc. Representing a real number Binary fractional representation:.75 1/ 3 11/16 =.11 Algorithm =.0101 =.1011 1. x = 2 *x 2. If x < 1 output 0 3. else x = x - 1; output 1 So how about just using the shortest binary fractional representation in the sequence interval. e.g. [0,.33) =.01 [.33,.66) =.1 [.66,1) =.11

Representing a code interval Can view binary fractional numbers as intervals by considering all completions. min max interval. 11. 110. 111 [. 75, 10. ). 101. 1010. 1011 [. 625,. 75) We will call this the code interval. Selecting the code interval To find a prefix code, find a binary fractional number whose code interval is contained in the sequence interval (dyadic number). Sequence Interval.79.61.75.625 Code Interval (.101) Can use L + s/2 truncated to 1 + log (1/s) bits

Bound on Arithmetic length Note that log s +1 = log (2/s) Bound on Length Theorem: For a text of length n, the Arithmetic encoder generates at most 1 + log (1/s) = = 1 + log (1/p i ) 2 + j=1,n log (1/p i ) = 2 + k=1, Σ np k log (1/p k ) = 2 + n H 0 bits nh 0 + 0.02 n bits in practice because of rounding

Integer Arithmetic Coding Problem is that operations on arbitrary precision real numbers is expensive. Key Ideas of integer version: Keep integers in range [0..R) where R=2 k Use rounding to generate integer interval Whenever sequence intervals falls into top, bottom or middle half, expand the interval by a factor 2 Integer Arithmetic is an approximation Integer Arithmetic (scaling) If l R/2 then (top half) Output 1 followed by m 0s m = 0 Message interval is expanded by 2 If u < R/2 then (bottom half) Output 0 followed by m 1s m = 0 Message interval is expanded by 2 All other cases, just continue... If l R/4 and u < 3R/4 then (middle half) Increment m Message interval is expanded by 2

You find this at Arithmetic ToolBox As a state machine L+s c s L (p 1,...,p Σ ) c ATB (L,s) L ATB (L,s ) Therefore, even the distribution can change over time

K-th order models: PPM Use previous k characters as the context. Makes use of conditional probabilities This is the changing distribution Base probabilities on counts: e.g. if seen th 12 times followed by e 7 times, then the conditional probability p(e th) = 7/12. Need to keep k small so that dictionary does not get too large (typically less than 8). PPM: Partial Matching Problem: What do we do if we have not seen context followed by character before? Cannot code 0 probabilities! The key idea of PPM is to reduce context size if previous match has not been seen. If character has not been seen before with current context of size 3, send an escape-msg and then try context of size 2, and then again an escape-msg and context of size 1,. Keep statistics for each context size < k The escape is a special character with some probability. Different variants of PPM use different heuristics for the probability.

PPM + Arithmetic ToolBox L+s σ s L L p[ σ context ] σ = c or esc ATB (L,s) ATB (L,s ) Encoder and Decoder must know the protocol for selecting the same conditional probability distribution (PPM-variant) PPM: Example Contexts Context Empty Counts A = 4 B = 2 C = 5 $ = 3 Context A B C Counts C = 3 $ = 1 A = 2 $ = 1 A = 1 B = 2 C = 2 $ = 3 Context String = ACCBACCACBA B k = 2 AC BA CA CB CC Counts B = 1 C = 2 $ = 2 C = 1 $ = 1 C = 1 $ = 1 A = 2 $ = 1 A = 1 B = 1 $ = 2

You find this at: compression.ru/ds/