Fast Arithmetic Coding (FastAC) Implementations

Save this PDF as:

Size: px
Start display at page:

Transcription

1 Fast Arithmetic Coding (FastAC) Implementations Amir Said 1 Introduction This document describes our fast implementations of arithmetic coding, which achieve optimal compression and higher throughput by better exploiting the great numerical capabilities of the current processors. References [1, 2] contain descriptions of the particularities of the coding methods used, and reference [3] presents the experiments and speed measurements used for optimizing our code. Objectives During the development of our program we had the following objectives: 1. Its interface must be simple enough so that it can used by programmers without compression expertise. 2. Its performance should represent the state-of-the-art of arithmetic coding in terms of speed, compression efficiency, and features. 3. It must be versatile, portable, and reliable enough to be used in more advanced coding projects. 4. The program should be clear and simple enough so that students in a first course on coding can learn about arithmetic coding, and truly understand how it works. While we can achieve all the practical objectives (1 3) with a single version, we believe that for educational purposes it is better to provide more than one example, so we finally decided to have four versions. They are all interchangeable, and each, from the simplest to the most optimized, can provide some more insight in the arithmetic coding process. The user only interested in using good arithmetic coding functions inside an application can employ (and possibly modify) only the first, which provides the best combination of efficiency and portability. 1

2 Document organization This document is divided in the following parts. Section 2 presents the rationale for using object-oriented programming in compression applications, and describes our C++ classes implementing data-source models, and the arithmetic encoder and decoder. Next, in Section 3, we provide more details on how to use our programs by using them on a simple compression problem. The functions implementing an encoder and decoder are presented and we comment on the programming choices, how the coding functions work, and how the C++ classes are used. In Section 4 we explain the differences between the four arithmetic coding implementations, and at the end, in Section 5, we explain the purpose of three programs that we use to demonstrate, on real compression applications, how to use and test our implementations. Program distribution All the programs are in a single zip file called FastAC.zip, which contains 7 subdirectories. Under the subdirectory AC Versions there are 4 subdirectories with each of the implementation files, which are all called arithmetic codec.h and arithmetic codec.cpp (see Section 4). The documentation files, FastAC Readme.pdf (this file), and FastAC QA.pdf (common questions), are in the root directory. The root directory also contains files that define a MS VC++ workspace (FastAC.dsw, FastAC.ncb, and FastAC.opt) with 3 projects, called acfile, acwav, and test. Each project subdirectory contains a different program as an example of how to use our code (see Section 5). The desired implementation files (arithmetic codec.h and arithmetic codec.cpp) should be copied to the root directory before compilation, and it is necessary to recompile the whole project ( rebuild all ). 2 Interface of the Arithmetic Coding C++ Classes Because there are innumerable types of data that need to be compressed, but only a few coding methods, there are many advantages in separating the processes of source modeling from entropy coding [1, 2]. While in practice it is not possible to completely separate the two, with the current object-oriented programming (OOP) tools it is possible to write programs that eliminate the need to understand the details of how the data-source models are actually implementations and used for coding. For example, for coding purposes the only information needed for modeling a data source is its number of data symbols, and the probability of each symbol. Thus, for OOP purposes, that is the only type of data abstraction needed. It is true that during the actual coding process what is used is data that is computed from the probabilities, i.e., the optimal codewords or coding functions, but we can use OOP to encapsulate that data and those functions in a manner that is completely transparent to the user. 2

3 Model classes We provide four C++ classes for modeling data sources, with definitions in the files called arithmetic codec.h, and implementations in the files called arithmetic codec.cpp. Note that there are four versions of these files, each corresponding to a different implementation of arithmetic coding, and in a separate subdirectory. The differences between them are explained in Section 4. We use the OOP data hiding principle in this document too. The public class information in all the files is the same, and is shown and explained here. The private data and functions different for each file are not shown because they do not affect their use. All the class definitions are shown in Figure 2. The first class, called Static Bit Model, is meant for sources with only two symbols (0 and 1). The only information needed for modeling is the probability of the symbol s = 0, which is defined using the function set probability 0. The second class, called Static Data Model can be used for sources with a number of symbols between 2 and 2 11 symbols or between 2 and 2 14 symbols, depending on the precision of the implementation. The function set distribution completely defines the source by setting the number of data symbols and their probabilities. If this function receives a pointer to probabilities equal to zero it assumes that all symbols are equally probable. If the data source has M symbols, they must be integers in the range [0, M 1]. Even though these models are called static, the symbol probabilities of these classes can be changed any time while coding, as long as the decoder uses the same sequence of probabilities. Their main limitation is that they only use the probabilities provided by the user. In most situations we do not know a priori the probabilities and we have to use estimates. Since the coding process can be quite complex, we want to avoid using two passes, one only for gathering statistics, and another for coding. The solution is provided by adaptive models, which simultaneously code and update probability estimates. Figure 2 shows the definition of class Adaptive Bit Model for binary sources. It starts assuming that both symbols are equally probable, and updates the probability estimates as the symbols are coded. Its only public function is reset, which restarts the estimation process. The last class, Adaptive Data Model, is for sources with a larger number of symbols. Before it is used for coding it needs to know the number of data symbols, which is defined using the function set alphabet. It also starts assuming that all symbols have equal probability, and has a function called reset to restart the estimation process. We have two types of classes because there are special optimizations that work only for binary models. In consequence, coding with binary models is typically 50% faster than using the general model with the number of symbols set to two. Encoder and decoder (codec) class For simplicity we use a single class, called Arithmetic Codec, to encapsulate both the encoder and the decoder functions. Its definition is shown in Figure 2. We use a framework in which all compressed data is saved to and fetched from a memory buffer, and only periodically written to or read from a file. Thus, before we can start encoding and decoding, it is necessary to define the memory location where we want the compressed data to be stored. 3

4 class Static Bit Model public: Static Bit Model(void); void set probability 0(double); // set probability of symbol 0 ; class Static Data Model public: Static Data Model(void); ~Static Data Model(void); void set distribution(unsigned number of symbols, const double probability[] = 0); // 0 means uniform unsigned model symbols(void) return data symbols; ; class Adaptive Bit Model public: Adaptive Bit Model(void); void reset(void); ; // restart estimation process class Adaptive Data Model public: Adaptive Data Model(void); ~Adaptive Data Model(void); Adaptive Data Model(unsigned number of symbols); void set alphabet(unsigned number of symbols); void reset(void); // restart estimation process unsigned model symbols(void) return data symbols; ; Figure 1: Definition of classes for supporting static and adaptive data-source models. 4

5 class Arithmetic Codec public: Arithmetic Codec(void); ~Arithmetic Codec(void); Arithmetic Codec(unsigned max code bytes, unsigned char * user buffer = 0); // 0 = assign new unsigned char * buffer(void) return code buffer; void set buffer(unsigned max code bytes, unsigned char * user buffer = 0); // 0 = assign new void start encoder(void); void start decoder(void); void read from file(file * code file); // read code data, start decoder ; unsigned stop encoder(void); // returns number of bytes used unsigned write to file(file * code file); // stop encoder, write code data void stop decoder(void); void put bit(unsigned bit); unsigned get bit(void); void put bits(unsigned data, unsigned number of bits); unsigned get bits(unsigned number of bits); void encode(unsigned bit, Static Bit Model &); unsigned decode(static Bit Model &); void encode(unsigned data, Static Data Model &); unsigned decode(static Data Model &); void encode(unsigned bit, Adaptive Bit Model &); unsigned decode(adaptive Bit Model &); void encode(unsigned data, Adaptive Data Model &); unsigned decode(adaptive Data Model &); Figure 2: Definition of the class for arithmetic encoding and decoding. 5

6 We must use a constructor or the function set buffer to define an amount of memory equal to max code bytes in which the compressed data can be written or read. The function buffer returns a pointer to the first byte in this buffer. Note that we said define, not allocate. This depends on the parameter user buffer. If it is zero then the class Arithmetic Codec will allocate the indicated amount of memory, and later its destructor will free it. Otherwise, it will use the user buffer pointer as the first position of the compressed data memory, assuming that the user is responsible for its allocation and for freeing it. Before encoding data it is necessary call the function start encoder to initialize and set the codec to an encoder mode. Next, data is coded using the version of the function encode corresponding to its data model. It is also possible to write an integer number of bits to the compressed stream using the functions put bit or put bits. If the data source has M symbols, the data must be unsigned integers in the range [0, M 1]. Similarly, the data with b bits should be in the range [0, 2 b 1]. Before the compressed data can be copied from its memory buffer it is necessary to call the function stop encoder, which saves the final bytes required for correct decoding, and returns the total number of bytes used. Alternatively, the function write to file can be used to simultaneously stop the encoder and save the compressed data to a file. This function also writes a small header so that the decoder knows how many bytes it needs to read. The decoding process is quite similar. Before it starts all the compressed data must to be copied to the memory buffer, and the function start decoder must be called. If the compressed data had been saved to a file using the function write to file, then the function read from file must be used to read the data and start the decoder. The compressed data is retrieved using the corresponding decode, get bit, or get bits functions. When decoding is done, the function stop decoder must be called to restore the codec to a default mode, so that it can be restarted to encode or decode. 3 Coding Example Let us consider the problem of compressing the data used for line plots. 1 Each plot is defined by an array of structures of the type Plot Segment, shown in the top of Figure 3. The first component, line color, indicates the index of the color of the line, which is a number between 0 and 7. The meaning of the second and third component depends on the first. If line color = 0 then it contains the absolute (x,y) coordinate to where the plotter pen should move ( move to instruction). If line color > 0 then it contains the (x,y) displacement from the current location, using a pen with the indicated color ( line to instruction). We assume that x and y are in the interval [ 2 15, ]. For designing a new compression method we should consider that, even though in theory we can use a large number of adaptive models to form complex high-order models for any type of data, in practice it is better to consider if we can exploit the particularities of typical data. This happens because adaptive methods need time to get good estimates, which is not a problem if the number of models is reasonably small, but becomes a problem when we have, for example, many thousands of models. 1 Surely obsolete, but it makes an interesting example. 6

7 For our plot-segment compression problem we can exploit the following facts: Consecutive line segments normally have the same color. New colors are expected after a move to. Curves are plotted with many short segments. One way to make use of the fact that something does not change very frequently is to first use binary information to indicate when changes occur, and only then code the new information. Figure 3 show an example of a coding function that uses this technique to code an array of structures of the type Plot Segment. Following the sequence of instructions in Figure 3, we see that we start defining one variable of the type Arithmetic Codec. We assume that it will compress to a memory buffer that is provided from outside the function Encode Plot, and use the proper constructor to set this buffer. Next we define four adaptive models. The first, color change model, is binary, and is used for coding color change information. The second, color model, is for coding the actual color information, and thus we use the constructor informing that its number of data symbols is equal to 8. The third model, short line model, is binary, and is used to indicate when displacements are small. The fourth model is for coding short lines. After the call to the function start encoder we have the loop for coding all the plot segments. The information coded depends on the previous and current colors. If in the previous segment we had line color = 0 then we code the color information immediately, otherwise we first code the change information, and, only if there is color change we code the new color. Next, we need to code the position or displacement (x,y). Since their range is quite large, we do not try to entropy code them all the time. If (x,y) represents absolute position (line color = 0), or a large displacement, then we just save x and y as 16-bit nonnegative numbers. If the displacement magnitude is smaller than 128 then we first code the information that the segment is short, followed by x and y, which are converted to nonnegative number and coded using step model. Figure 3 shows the function Decode Plot to decompress the data created by Encode Plot. Note that it is very similar to the encoder, since it must reproduce all the sequences of decisions taken by the encoder. Thus, even though we have four types of information that can be coded, we have correct decoding because the sequence of models used by the decoder is identical to encoder s sequence. 4 Arithmetic Coding Versions As explained in the introduction, for educational purposes we have written four different fully functioning implementations of arithmetic coding. In this section we present the main features of each implementation. For practical use we recommend Version 1, which is 100% portable and quite efficient. For those that want to learn how arithmetic coding works, 7

8 struct Plot Segment int line color, x, y; ; int Encode Plot(int plot points, Plot Segment seg[], int buffer size, unsigned char * compressed data) Arithmetic Codec ace(buffer size, compressed data); Adaptive Bit Model color change model; Adaptive Data Model color model(8); Adaptive Bit Model short line model; Adaptive Data Model step model(257); ace.start encoder(); int short line, last color, current color = 1; for (int p = 0; p < plot points; p++) last color = current color; current color = seg[p].line color; if (last color!= 0) ace.encode(last color!= current color, color change model); if ((last color == 0) (last color!= current color)) ace.encode(current color, color model); if (current color == 0) short line = 0; else short line = (abs(seg[p].x) <= 128) && (abs(seg[p].y) <= 128); ace.encode(short line, short line model); if (short line) ace.encode(seg[p].x + 128, step model); ace.encode(seg[p].y + 128, step model); else ace.put bits(seg[p].x , 16); ace.put bits(seg[p].y , 16); return ace.stop encoder(); // return number of bytes used for compression Figure 3: Definition of structure Plot Segment for storing plot graphic data, and implementation of function Encode Plot. 8

9 void Decode Plot(int plot points, int data bytes, unsigned char * compressed data, Plot Segment seg[]) Arithmetic Codec acd(data bytes, compressed data); Adaptive Bit Model color change model; Adaptive Data Model color model(8); Adaptive Bit Model short line model; Adaptive Data Model step model(257); acd.start decoder(); int short line, current color = 1; for (int p = 0; p < plot points; p++) if (current color == 0) current color = acd.decode(color model); else if (acd.decode(color change model)) current color = acd.decode(color model); seg[p].line color = current color; if (current color == 0) short line = 0; else short line = acd.decode(short line model); if (short line) seg[p].x = int(acd.decode(step model)) - 128; seg[p].y = int(acd.decode(step model)) - 128; else seg[p].x = int(acd.get bits(16)) ; seg[p].y = int(acd.get bits(16)) ; acd.stop decoder(); Figure 4: Implementation of function Decode Plot. 9

10 inspect the code details, and possibly change the programs, we suggest following the inverse version order, i.e., start from Version 4, followed by 3, 2, and 1. Version 1: 32-bit variables, 32-bit products This version, which is in the directory int 32 32, is the most portable and usually the fastest, integrating all the main acceleration techniques. It is a good example of how an implementation can be both efficient and simple. It has some significant differences compared to the straightforward floating-point version, but it should not be difficult to understand how it works. It uses 32-bit arithmetic all the time, which is exploited to have compression very near the optimal. Before multiplications it discards the least-significant bits to avoid overflow. Thus, the total number of bits of precision assigned to the interval length and probability must not exceed 32. In the current version the precision depends on the model. For binary models we have 19 bits for the length and 13 bits for probabilities, while for the other models, we have 17 bits for the length and 15 bits for probabilities. This version saves full bytes during renormalization, and has two forms of decoding. The first, used when the number of symbols is large, is based on fast table look-up and needs divisions (which can be slow on some processors). The second form replaces the division with several multiplications, and is used when the number of symbols is small. Version 2: 32-bit variables, 32-bit products, sorted symbols The version in directory int sorted is very similar to the previous versions, but it sorts the symbols according to probability to optimize some operations. If the source distribution is highly skewed, then this can be the fastest version, using only multiplications (no divisions). In other cases it is better to use the version with table look-up decoding. The static binary model in this version is also different: we provide an implementation that shows that for binary coders it is relatively easy to approximate multiplications with bit shifts (the speed improvement, however, is not very significant). Version 3: 32-bit variables, 64-bit products Many 32-bit processors (e.g., Pentium) compute the full 64 bits of a multiplication of two 32-bit integers. The problem is that the compilers do not give access to the register with the most significant bits. The version in directory int shows how to use a few lines of assembler code to multiply and extract those bits, and also for a 64-bit division used by table look-up decoding. It can be a bit slower, but mostly because the compilers don t know how to use assembler and optimize at the same time. On the other hand, it has a precision that guarantees virtually optimal compression even in extreme cases. 10

11 Version 4: floating-point arithmetic This version, in directory floating-point, was created because we believe floating-point numbers provide a more intuitive way of understanding arithmetic coding. Students can follow the program s execution, and can see all the values in a scale that shows the true value of probabilities. The implementation is straightforward, without much effort to optimize speed (but it is not significantly worse than the others). The implementation uses 48 bits of double precision floating-point numbers, and renormalization occurs when at least 16 bits are ready. Two extra tricks had to be used: a small offset is added to the code value while decoding (to deal with the hard-to-predict behavior of the least significant bits beyond the first 48), and some extra leakage for each coded symbol (to compensate for possible rounding and other factors). 5 Arithmetic Coding Demo Programs The three programs with applications are in the files called acfile.cpp, acwav.cpp, and test.cpp, in the subdirectories with the same name. Here is their description. acfile.cpp This is an example of how to use arithmetic coding to compress any file (text, executables, etc.) using a relatively small number of adaptive models. Probably the most important lesson to be learned from this program is that our code is very easy to use, and enables writing a reasonably good compression program with small effort. The program s usage for compressing and decompressing a file is, respectively acfile -c file name compressed data file acfile -d compressed data file new file acwav.cpp This is a slightly more complex example. It is for lossless compression of audio files. The audio file format supported is wav, which is supported by all CD ripping programs. It uses some signal processing to improve compression (the reversible S+P transform), but the coding process is very similar to the first case, except that here we decompose each transform sample in a bits+data representation (same as VLI in JPEG & MPEG standards). The former is coded with contexts and adaptive models, while the bits of the latter are save directly. The program s usage is similar. For compression and decompression use acwav -c wav file compressed wav file acwav -d compressed wav file new wav file 11

12 test.cpp This program is not really an application. It has functions to test and benchmark the speed of our arithmetic coding implementations (it is similar to the program used in [3]). Given a number of data symbols, it defines several values of source entropy, and for each value it generates millions of pseudo-random source samples. The times to encode and decode this data are measured, and it finally compares the decoded with the original to make sure the code is correct. The usage is test number of symbols test number of symbols number of simulation cycles The first uses a default number of cycles equal to 10. In this directory we also have the files test support.h and test support.cpp, which contain some functions required for the simulations, like the pseudo-random number generator. References [1] Amir Said, Arithmetic Coding, in Lossless Compression Handbook, (K. Sayood, Ed.), Academic Press, San Diego, CA, [2] Amir Said, Introduction to Arithmetic Coding Theory and Practice, Hewlett-Packard Laboratories Report, HPL , Palo Alto, CA, April 2004 (http://www.hpl.hp.com/techreports/). [3] Amir Said, Comparative Analysis of Arithmetic Coding Computational Complexity, Hewlett-Packard Laboratories Report, HPL , Palo Alto, CA, April 2004 (http://www.hpl.hp.com/techreports/). 12

Data Storage 3.1. Foundations of Computer Science Cengage Learning

3 Data Storage 3.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: List five different data types used in a computer. Describe how

Data Storage. Chapter 3. Objectives. 3-1 Data Types. Data Inside the Computer. After studying this chapter, students should be able to:

Chapter 3 Data Storage Objectives After studying this chapter, students should be able to: List five different data types used in a computer. Describe how integers are stored in a computer. Describe how

and the bitplane representing the second most significant bits is and the bitplane representing the least significant bits is

1 7. BITPLANE GENERATION The integer wavelet transform (IWT) is transformed into bitplanes before coding. The sign bitplane is generated based on the polarity of elements of the transform. The absolute

Streaming Lossless Data Compression Algorithm (SLDC)

Standard ECMA-321 June 2001 Standardizing Information and Communication Systems Streaming Lossless Data Compression Algorithm (SLDC) Phone: +41 22 849.60.00 - Fax: +41 22 849.60.01 - URL: http://www.ecma.ch

Techniques Covered in this Class. Inverted List Compression Techniques. Recap: Search Engine Query Processing. Recap: Search Engine Query Processing

Recap: Search Engine Query Processing Basically, to process a query we need to traverse the inverted lists of the query terms Lists are very long and are stored on disks Challenge: traverse lists as quickly

PART OF THE PICTURE: Computer Architecture

PART OF THE PICTURE: Computer Architecture 1 PART OF THE PICTURE: Computer Architecture BY WILLIAM STALLINGS At a top level, a computer consists of processor, memory, and I/O components, with one or more

Digital Fundamentals

Digital Fundamentals with PLD Programming Floyd Chapter 2 29 Pearson Education Decimal Numbers The position of each digit in a weighted number system is assigned a weight based on the base or radix of

Adaptive Huffman coding If we want to code a sequence from an unknown source using Huffman coding, we need to know the probabilities of the different symbols. Most straightforward is to make two passes

Information, Entropy, and Coding

Chapter 8 Information, Entropy, and Coding 8. The Need for Data Compression To motivate the material in this chapter, we first consider various data sources and some estimates for the amount of data associated

Relative Data Redundancy

Image Compression Relative Data Redundancy Let b and b denote the number of bits in two representations of the same information, the relative data redundancy R is R = 1-1/C C is called the compression

C++ Programming Language

C++ Programming Language Lecturer: Yuri Nefedov 7th and 8th semesters Lectures: 34 hours (7th semester); 32 hours (8th semester). Seminars: 34 hours (7th semester); 32 hours (8th semester). Course abstract

Image Compression through DCT and Huffman Coding Technique

International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul

Raima Database Manager Version 14.0 In-memory Database Engine

+ Raima Database Manager Version 14.0 In-memory Database Engine By Jeffrey R. Parsons, Senior Engineer January 2016 Abstract Raima Database Manager (RDM) v14.0 contains an all new data storage engine optimized

BHARATHIAR UNIVERSITY: COIMBATORE CENTRE FOR COLLABORATION OF INDUSTRY AND INSTITUTIONS(CCII) CERTIFICATE IN ADVANCED PROGRAMMING C++ LANGUAGE

Certificate in Advanced Programming - C++ Language Page 1 of 7 BHARATHIAR UNIVERSITY: COIMBATORE 641046 CENTRE FOR COLLABORATION OF INDUSTRY AND INSTITUTIONS(CCII) CERTIFICATE IN ADVANCED PROGRAMMING C++

Fractional Numbers. Fractional Number Notations. Fixed-point Notation. Fixed-point Notation

2 Fractional Numbers Fractional Number Notations 2010 - Claudio Fornaro Ver. 1.4 Fractional numbers have the form: xxxxxxxxx.yyyyyyyyy where the x es constitute the integer part of the value and the y

CHAPTER 2 Data Representation in Computer Systems

CHAPTER 2 Data Representation in Computer Systems 2.1 Introduction 47 2.2 Positional Numbering Systems 48 2.3 Converting Between Bases 48 2.3.1 Converting Unsigned Whole Numbers 49 2.3.2 Converting Fractions

Principles of Image Compression

Principles of Image Compression Catania 03/04/2008 Arcangelo Bruna Overview Image Compression is the Image Data Elaboration branch dedicated to the image data representation It analyzes the techniques

Compression techniques

Compression techniques David Bařina February 22, 2013 David Bařina Compression techniques February 22, 2013 1 / 37 Contents 1 Terminology 2 Simple techniques 3 Entropy coding 4 Dictionary methods 5 Conclusion

Inverted Indexes Compressed Inverted Indexes. Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 40

Inverted Indexes Compressed Inverted Indexes Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 40 Compressed Inverted Indexes It is possible to combine index compression and

Optimization Techniques in C. Team Emertxe

Optimization Techniques in C Team Emertxe Optimization Techniques Basic Concepts Programming Algorithm and Techniques Optimization Techniques Basic Concepts What is Optimization Methods Space and Time

Number Representation

Number Representation Number System :: The Basics We are accustomed to using the so-called decimal number system Ten digits ::,,,3,4,5,6,7,8,9 Every digit position has a weight which is a power of Base

CPE 323 Data Types and Number Representations

CPE 323 Data Types and Number Representations Aleksandar Milenkovic Numeral Systems: Decimal, binary, hexadecimal, and octal We ordinarily represent numbers using decimal numeral system that has 10 as

Representation of Data

Representation of Data In contrast with higher-level programming languages, C does not provide strong abstractions for representing data. Indeed, while languages like Racket has a rich notion of data type

Number Representation and Arithmetic in Various Numeral Systems

1 Number Representation and Arithmetic in Various Numeral Systems Computer Organization and Assembly Language Programming 203.8002 Adapted by Yousef Shajrawi, licensed by Huong Nguyen under the Creative

The C Programming Language course syllabus associate level

TECHNOLOGIES The C Programming Language course syllabus associate level Course description The course fully covers the basics of programming in the C programming language and demonstrates fundamental programming

Number Representation

Number Representation CS10001: Programming & Data Structures Pallab Dasgupta Professor, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Topics to be Discussed How are numeric data

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Overview CISC Developments Over Twenty Years Classic CISC design: Digital VAX VAXÕs RISC successor: PRISM/Alpha IntelÕs ubiquitous 80x86 architecture Ð 8086 through the Pentium Pro (P6) RJS 2/3/97 Philosophy

The largest has a 0 in the sign position and 0's in all other positions:

10.2 Sign Magnitude Representation Sign Magnitude is straight-forward method for representing both positive and negative integers. It uses the most significant digit of the digit string to indicate the

CPU-specific optimization. Example of a target CPU core: ARM Cortex-M4F core inside LM4F120H5QR microcontroller in Stellaris LM4F120 Launchpad.

CPU-specific optimization 1 Example of a target CPU core: ARM Cortex-M4F core inside LM4F120H5QR microcontroller in Stellaris LM4F120 Launchpad. Example of a function that we want to optimize: adding 1000

Image Compression. Topics

Image Compression October 2010 Topics Redundancy Image information Fidelity Huffman coding Arithmetic coding Golomb code LZW coding Run Length Encoding Bit plane coding 1 Why do we need compression? Data

Almost every lossy compression system contains a lossless compression system

Lossless compression in lossy compression systems Almost every lossy compression system contains a lossless compression system Lossy compression system Transform Quantizer Lossless Encoder Lossless Decoder

Multimedia Communications. Huffman Coding

Multimedia Communications Huffman Coding Optimal codes Suppose that a i -> w i C + is an encoding scheme for a source alphabet A={a 1,, a N }. Suppose that the source letter a 1,, a N occur with relative

1. Which of the following Boolean operations produces the output 1 for the fewest number of input patterns?

Test Bank Chapter One (Data Representation) Multiple Choice Questions 1. Which of the following Boolean operations produces the output 1 for the fewest number of input patterns? ANSWER: A A. AND B. OR

Number Systems and. Data Representation

Number Systems and Data Representation 1 Lecture Outline Number Systems Binary, Octal, Hexadecimal Representation of characters using codes Representation of Numbers Integer, Floating Point, Binary Coded

Topics. Introduction. Java History CS 146. Introduction to Programming and Algorithms Module 1. Module Objectives

Introduction to Programming and Algorithms Module 1 CS 146 Sam Houston State University Dr. Tim McGuire Module Objectives To understand: the necessity of programming, differences between hardware and software,

Record Storage and Primary File Organization

Record Storage and Primary File Organization 1 C H A P T E R 4 Contents Introduction Secondary Storage Devices Buffering of Blocks Placing File Records on Disk Operations on Files Files of Unordered Records

Writing Portable Programs COS 217

Writing Portable Programs COS 217 1 Goals of Today s Class Writing portable programs in C Sources of heterogeneity Data types, evaluation order, byte order, char set, Reading period and final exam Important

Restoring division. 2. Run the algorithm Let s do 0111/0010 (7/2) unsigned. 3. Find remainder here. 4. Find quotient here.

Binary division Dividend = divisor q quotient + remainder CS/COE447: Computer Organization and Assembly Language Given dividend and divisor, we want to obtain quotient (Q) and remainder (R) Chapter 3 We

Today. Binary addition Representing negative numbers. Andrew H. Fagg: Embedded Real- Time Systems: Binary Arithmetic

Today Binary addition Representing negative numbers 2 Binary Addition Consider the following binary numbers: 0 0 1 0 0 1 1 0 0 0 1 0 1 0 1 1 How do we add these numbers? 3 Binary Addition 0 0 1 0 0 1 1

Two s Complement Arithmetic

Two s Complement Arithmetic We now address the issue of representing integers as binary strings in a computer. There are four formats that have been used in the past; only one is of interest to us. The

Numerical Matrix Analysis

Numerical Matrix Analysis Lecture Notes #10 Conditioning and / Peter Blomgren, blomgren.peter@gmail.com Department of Mathematics and Statistics Dynamical Systems Group Computational Sciences Research

A Catalogue of the Steiner Triple Systems of Order 19

A Catalogue of the Steiner Triple Systems of Order 19 Petteri Kaski 1, Patric R. J. Östergård 2, Olli Pottonen 2, and Lasse Kiviluoto 3 1 Helsinki Institute for Information Technology HIIT University of

Java (12 Weeks) Introduction to Java Programming Language

Java (12 Weeks) Topic Lecture No. Introduction to Java Programming Language 1 An Introduction to Java o Java as a Programming Platform, The Java "White Paper" Buzzwords, Java and the Internet, A Short

2. Compressing data to reduce the amount of transmitted data (e.g., to save money).

Presentation Layer The presentation layer is concerned with preserving the meaning of information sent across a network. The presentation layer may represent (encode) the data in various ways (e.g., data

Performance Basics; Computer Architectures

8 Performance Basics; Computer Architectures 8.1 Speed and limiting factors of computations Basic floating-point operations, such as addition and multiplication, are carried out directly on the central

Introduction to Arithmetic Coding - Theory and Practice

Introduction to Arithmetic Coding - Theory and Practice Amir Said Imaging Systems Laboratory HP Laboratories Palo Alto HPL-2004-76 April 21, 2004* entropy coding, compression, complexity This introduction

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

Computers CMPT 125: Lecture 1: Understanding the Computer Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University January 3, 2009 A computer performs 2 basic functions: 1.

Chapter II Binary Data Representation

Chapter II Binary Data Representation The atomic unit of data in computer systems is the bit, which is actually an acronym that stands for BInary digit. It can hold only 2 values or states: 0 or 1, true

Keil C51 Cross Compiler

Keil C51 Cross Compiler ANSI C Compiler Generates fast compact code for the 8051 and it s derivatives Advantages of C over Assembler Do not need to know the microcontroller instruction set Register allocation

Motorola 8- and 16-bit Embedded Application Binary Interface (M8/16EABI)

Motorola 8- and 16-bit Embedded Application Binary Interface (M8/16EABI) SYSTEM V APPLICATION BINARY INTERFACE Motorola M68HC05, M68HC08, M68HC11, M68HC12, and M68HC16 Processors Supplement Version 2.0

Unit 5 Central Processing Unit (CPU)

Unit 5 Central Processing Unit (CPU) Introduction Part of the computer that performs the bulk of data-processing operations is called the central processing unit (CPU). It consists of 3 major parts: Register

Chapter 5. Processing Unit Design

Chapter 5 Processing Unit Design 5.1 CPU Basics A typical CPU has three major components: Register Set, Arithmetic Logic Unit, and Control Unit (CU). The register set is usually a combination of general-purpose

CS 16: Assembly Language Programming for the IBM PC and Compatibles

CS 16: Assembly Language Programming for the IBM PC and Compatibles First, a little about you Your name Have you ever worked with/used/played with assembly language? If so, talk about it Why are you taking

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

Sawmill Log Analyzer Best Practices!! Page 1 of 6 Sawmill Log Analyzer Best Practices! Sawmill Log Analyzer Best Practices!! Page 2 of 6 This document describes best practices for the Sawmill universal

1.3 Data Representation

8628-28 r4 vs.fm Page 9 Thursday, January 2, 2 2:4 PM.3 Data Representation 9 appears at Level 3, uses short mnemonics such as ADD, SUB, and MOV, which are easily translated to the ISA level. Assembly

Programming Languages & Tools

4 Programming Languages & Tools Almost any programming language one is familiar with can be used for computational work (despite the fact that some people believe strongly that their own favorite programming

El Dorado Union High School District Educational Services

El Dorado Union High School District Course of Study Information Page Course Title: ACE Computer Programming II (#495) Rationale: A continuum of courses, including advanced classes in technology is needed.

Chapter 3: Digital Audio Processing and Data Compression

Chapter 3: Digital Audio Processing and Review of number system 2 s complement sign and magnitude binary The MSB of a data word is reserved as a sign bit, 0 is positive, 1 is negative. The rest of the

Glossary of Object Oriented Terms

Appendix E Glossary of Object Oriented Terms abstract class: A class primarily intended to define an instance, but can not be instantiated without additional methods. abstract data type: An abstraction

CHAPTER TWO. 2.1 Unsigned Binary Counting. Numbering Systems

CHAPTER TWO Numbering Systems Chapter one discussed how computers remember numbers using transistors, tiny devices that act like switches with only two positions, on or off. A single transistor, therefore,

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture

Example Solution to Exam in EDA150 C Programming

Example Solution to Exam in EDA150 C Programming Janurary 12, 2011, 14-19 Inga hjälpmedel! Examinator: Jonas Skeppstedt, tel 0767 888 124 30 out of 60p are needed to pass the exam. General Remarks A function

Using Java 5.0 BigDecimal

Using Java 5.0 BigDecimal Mike Cowlishaw IBM Fellow http://www2.hursley.ibm.com/decimal Mike Cowlishaw Using Java 5.0 BigDecimal Page 1 Overview Background Why decimal arithmetic is important New standards

198:211 Computer Architecture

198:211 Computer Architecture Topics: Lecture 8 (W5) Fall 2012 Data representation 2.1 and 2.2 of the book Floating point 2.4 of the book 1 Computer Architecture What do computers do? Manipulate stored

8051 Programming. The 8051 may be programmed using a low-level or a high-level programming language.

8051 Programming The 8051 may be programmed using a low-level or a high-level programming language. Low-Level Programming Assembly language programming writes statements that the microcontroller directly

Processing Unit Design

&CHAPTER 5 Processing Unit Design In previous chapters, we studied the history of computer systems and the fundamental issues related to memory locations, addressing modes, assembly language, and computer

Analysis of Compression Algorithms for Program Data

Analysis of Compression Algorithms for Program Data Matthew Simpson, Clemson University with Dr. Rajeev Barua and Surupa Biswas, University of Maryland 12 August 3 Abstract Insufficient available memory

Numbering Systems. InThisAppendix...

G InThisAppendix... Introduction Binary Numbering System Hexadecimal Numbering System Octal Numbering System Binary Coded Decimal (BCD) Numbering System Real (Floating Point) Numbering System BCD/Binary/Decimal/Hex/Octal

The programming language C. sws1 1

The programming language C sws1 1 The programming language C invented by Dennis Ritchie in early 1970s who used it to write the first Hello World program C was used to write UNIX Standardised as K&C (Kernighan

Feature of 8086 Microprocessor

8086 Microprocessor Introduction 8086 is the first 16 bit microprocessor which has 40 pin IC and operate on 5volt power supply. which has twenty address limes and works on two modes minimum mode and maximum.

A First Book of C++ Chapter 2 Data Types, Declarations, and Displays

A First Book of C++ Chapter 2 Data Types, Declarations, and Displays Objectives In this chapter, you will learn about: Data Types Arithmetic Operators Variables and Declarations Common Programming Errors

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Interpreters and virtual machines Michel Schinz 2007 03 23 Interpreters Interpreters Why interpreters? An interpreter is a program that executes another program, represented as some kind of data-structure.

Key Components of WAN Optimization Controller Functionality

Key Components of WAN Optimization Controller Functionality Introduction and Goals One of the key challenges facing IT organizations relative to application and service delivery is ensuring that the applications

Illustration 1: Diagram of program function and data flow

The contract called for creation of a random access database of plumbing shops within the near perimeter of FIU Engineering school. The database features a rating number from 1-10 to offer a guideline

Primitive Data Types Summer 2010 Margaret Reid-Miller

Primitive Data Types 15-110 Summer 2010 Margaret Reid-Miller Data Types Data stored in memory is a string of bits (0 or 1). What does 1000010 mean? 66? 'B'? 9.2E-44? How the computer interprets the string

The software beyond the climatic. Environment simulation

Spirale 2 The software beyond the climatic Environment simulation Spirale 2... Take it easy! Spirale 2 is a new software, based on a reliable system (Windows NT) and if its performances are surprising,

Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com

CSCI-UA.0201-003 Computer Systems Organization Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Some slides adapted (and slightly modified)

Binary Representation and Computer Arithmetic

Binary Representation and Computer Arithmetic The decimal system of counting and keeping track of items was first created by Hindu mathematicians in India in A.D. 4. Since it involved the use of fingers

A Proposal for OpenEXR Color Management

A Proposal for OpenEXR Color Management Florian Kainz, Industrial Light & Magic Revision 5, 08/05/2004 Abstract We propose a practical color management scheme for the OpenEXR image file format as used

Data Compression for Information Interchange - Adaptive Coding with Embedded Dictionary - DCLZ Algorithm

Standard ECMA-151 June 1991 Published in electronic form in September 1999 Standardizing Information and Communication Systems Data Compression for Information Interchange - Adaptive Coding with Embedded

Signal Compression Survey of the lectures Hints for exam

Signal Compression Survey of the lectures Hints for exam Chapter 1 Use one statement to define the three basic signal compression problems. Answer: (1) designing a good code for an independent source;

Quiz for Chapter 3 Arithmetic for Computers 3.10

Date: Quiz for Chapter 3 Arithmetic for Computers 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in RED

William Stallings Computer Organization and Architecture

William Stallings Computer Organization and Architecture Chapter 12 CPU Structure and Function Rev. 3.3 (2009-10) by Enrico Nardelli 12-1 CPU Functions CPU must: Fetch instructions Decode instructions

Part 1 Theory Fundamentals

Part 1 Theory Fundamentals 2 Chapter 1 Information Representation Learning objectives By the end of this chapter you should be able to: show understanding of the basis of different number systems show

Exceptions in MIPS. know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine

7 Objectives After completing this lab you will: know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine Introduction Branches and jumps provide ways to change

Digital Arithmetic. Digital Arithmetic: Operations and Circuits Dr. Farahmand

Digital Arithmetic Digital Arithmetic: Operations and Circuits Dr. Farahmand Binary Arithmetic Digital circuits are frequently used for arithmetic operations Fundamental arithmetic operations on binary

Decimal Numbers: Base 10 Integer Numbers & Arithmetic

Decimal Numbers: Base 10 Integer Numbers & Arithmetic Digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Example: 3271 = (3x10 3 ) + (2x10 2 ) + (7x10 1 )+(1x10 0 ) Ward 1 Ward 2 Numbers: positional notation Number

Floating Point Numbers. Question. Learning Outcomes. Number Representation - recap. Do you have your laptop here?

Question Floating Point Numbers 6.626068 x 10-34 Do you have your laptop here? A yes B no C what s a laptop D where is here? E none of the above Eddie Edwards eedwards@doc.ic.ac.uk https://www.doc.ic.ac.uk/~eedwards/compsys

1 Basic Computing Concepts (4) Data Representations

1 Basic Computing Concepts (4) Data Representations The Binary System The Binary System is a way of writing numbers using only the digits 0 and 1. This is the method used by the (digital) computer. The

Outline. V Computer Systems Organization II (Honors) (Introductory Operating Systems) (Review) Memory Management

Outline V22.0202-001 Computer Systems Organization II (Honors) (Introductory Operating Systems) Lecture 14 Memory Management March 28, 2005 Announcements Lab 4 due next Monday (April 4 th ) demos on 4

Development and Evaluation of Point Cloud Compression for the Point Cloud Library

Development and Evaluation of Point Cloud Compression for the Institute for Media Technology, TUM, Germany May 12, 2011 Motivation Point Cloud Stream Compression Network Point Cloud Stream Decompression

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals Modified from the lecture slides of Lami Kaya (LKaya@ieee.org) for use CECS 474, Fall 2008. 2009 Pearson Education Inc., Upper

Storing Measurement Data

Storing Measurement Data File I/O records or reads data in a file. A typical file I/O operation involves the following process. 1. Create or open a file. Indicate where an existing file resides or where

An Introduction To Simple Scheduling (Primarily targeted at Arduino Platform)

An Introduction To Simple Scheduling (Primarily targeted at Arduino Platform) I'm late I'm late For a very important date. No time to say "Hello, Goodbye". I'm late, I'm late, I'm late. (White Rabbit in

Radix Number Systems. Number Systems. Number Systems 4/26/2010. basic idea of a radix number system how do we count:

Number Systems binary, octal, and hexadecimal numbers why used conversions, including to/from decimal negative binary numbers floating point numbers character codes basic idea of a radix number system

DNA Data and Program Representation. Alexandre David 1.2.05 adavid@cs.aau.dk

DNA Data and Program Representation Alexandre David 1.2.05 adavid@cs.aau.dk Introduction Very important to understand how data is represented. operations limits precision Digital logic built on 2-valued