Sessions 1, 2 and 3 Number Systems

Similar documents
Lecture 2. Binary and Hexadecimal Numbers

NUMBER SYSTEMS. William Stallings

COMP 250 Fall 2012 lecture 2 binary representations Sept. 11, 2012

Base Conversion written by Cathy Saxton

To convert an arbitrary power of 2 into its English equivalent, remember the rules of exponential arithmetic:

Computer Science 281 Binary and Hexadecimal Review

=

Section 1.4 Place Value Systems of Numeration in Other Bases

Pre-Algebra Lecture 6

Levent EREN A-306 Office Phone: INTRODUCTION TO DIGITAL LOGIC

The string of digits in the binary number system represents the quantity

Useful Number Systems

Number Conversions Dr. Sarita Agarwal (Acharya Narendra Dev College,University of Delhi)

Solution for Homework 2

Oct: 50 8 = 6 (r = 2) 6 8 = 0 (r = 6) Writing the remainders in reverse order we get: (50) 10 = (62) 8

Unit 1 Number Sense. In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions.

Binary, Hexadecimal, Octal, and BCD Numbers

Binary Representation. Number Systems. Base 10, Base 2, Base 16. Positional Notation. Conversion of Any Base to Decimal.

2011, The McGraw-Hill Companies, Inc. Chapter 3

EE 261 Introduction to Logic Circuits. Module #2 Number Systems

47 Numerator Denominator

Session 7 Fractions and Decimals

Lecture 11: Number Systems

Basic numerical skills: FRACTIONS, DECIMALS, PROPORTIONS, RATIOS AND PERCENTAGES

Addition Methods. Methods Jottings Expanded Compact Examples = 15

Systems I: Computer Organization and Architecture

CSI 333 Lecture 1 Number Systems

Numbering Systems. InThisAppendix...

Welcome to Basic Math Skills!

Unit 6 Number and Operations in Base Ten: Decimals

NUMBER SYSTEMS APPENDIX D. You will learn about the following in this appendix:

Fractions to decimals

Number Representation

Digital System Design Prof. D Roychoudhry Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

A Step towards an Easy Interconversion of Various Number Systems

CDA 3200 Digital Systems. Instructor: Dr. Janusz Zalewski Developed by: Dr. Dahai Guo Spring 2012

Everything you wanted to know about using Hexadecimal and Octal Numbers in Visual Basic 6

Chapter 1. Binary, octal and hexadecimal numbers

JobTestPrep's Numeracy Review Decimals & Percentages

NUMBER SYSTEMS. 1.1 Introduction

Decimals and other fractions

Unsigned Conversions from Decimal or to Decimal and other Number Systems

1. The Fly In The Ointment

Number Systems and Radix Conversion

Decimal Notations for Fractions Number and Operations Fractions /4.NF

Chapter 1: Digital Systems and Binary Numbers

The Crescent Primary School Calculation Policy

SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89. by Joseph Collison

6 The Hindu-Arabic System (800 BC)

DIGITAL-TO-ANALOGUE AND ANALOGUE-TO-DIGITAL CONVERSION

The Hexadecimal Number System and Memory Addressing

Chapter 4: Computer Codes

2 Number Systems. Source: Foundations of Computer Science Cengage Learning. Objectives After studying this chapter, the student should be able to:

Working with whole numbers

COMPSCI 210. Binary Fractions. Agenda & Reading

Encoding Text with a Small Alphabet

Kenken For Teachers. Tom Davis June 27, Abstract

CHAPTER 4 DIMENSIONAL ANALYSIS

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products

Numeral Systems. The number twenty-five can be represented in many ways: Decimal system (base 10): 25 Roman numerals:

A Short Guide to Significant Figures

Binary Number System. 16. Binary Numbers. Base 10 digits: Base 2 digits: 0 1

Numeration systems. Resources and methods for learning about these subjects (list a few here, in preparation for your research):

Digital Design. Assoc. Prof. Dr. Berna Örs Yalçın

Preliminary Mathematics

2) Write in detail the issues in the design of code generator.

Binary Adders: Half Adders and Full Adders

Decimal form. Fw.d Exponential form Ew.d Ew.dEe Scientific form ESw.d ESw.dEe Engineering form ENw.d ENw.dEe Lw. Horizontal

plc numbers Encoded values; BCD and ASCII Error detection; parity, gray code and checksums

Integer Operations. Overview. Grade 7 Mathematics, Quarter 1, Unit 1.1. Number of Instructional Days: 15 (1 day = 45 minutes) Essential Questions

CHAPTER 5 Round-off errors

PAYCHEX, INC. BASIC BUSINESS MATH TRAINING MODULE

Sunny Hills Math Club Decimal Numbers Lesson 4

Binary Representation

1. Give the 16 bit signed (twos complement) representation of the following decimal numbers, and convert to hexadecimal:

PROBLEMS (Cap. 4 - Istruzioni macchina)

Binary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit

YOU MUST BE ABLE TO DO THE FOLLOWING PROBLEMS WITHOUT A CALCULATOR!

Binary Numbers. Binary Octal Hexadecimal

Counting in base 10, 2 and 16

CS201: Architecture and Assembly Language

Playing with Numbers

LSN 2 Number Systems. ECT 224 Digital Computer Fundamentals. Department of Engineering Technology

The gas can has a capacity of 4.17 gallons and weighs 3.4 pounds.

Guidance paper - The use of calculators in the teaching and learning of mathematics

5544 = = = Now we have to find a divisor of 693. We can try 3, and 693 = 3 231,and we keep dividing by 3 to get: 1

This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers

PREPARATION FOR MATH TESTING at CityLab Academy

Lies My Calculator and Computer Told Me

Chapter 5. Binary, octal and hexadecimal numbers

Name: Class: Date: 9. The compiler ignores all comments they are there strictly for the convenience of anyone reading the program.

8 Primes and Modular Arithmetic

Today. Binary addition Representing negative numbers. Andrew H. Fagg: Embedded Real- Time Systems: Binary Arithmetic

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

Part 1 Expressions, Equations, and Inequalities: Simplifying and Solving

Paramedic Program Pre-Admission Mathematics Test Study Guide

1 Description of The Simpletron

WRITING PROOFS. Christopher Heil Georgia Institute of Technology

Pigeonhole Principle Solutions

Linear Programming Notes V Problem Transformations

Transcription:

COMP 1113 Sessions 1, 2 and 3 Number Systems The goal of these three class sessions is to examine ways in which numerical and text information can be both stored in computer memory and how numerical information might be manipulated to simulate common arithmetic operations. In order to learn how it works, it is necessary to introduce some formalism. First, some definitions. A number is a symbol that represents or corresponds to a quantity of things. A number system (or, as some prefer, a numeral system) is a set of symbols and rules for representing a useful range of numbers. A wide variety of number systems have been used by people throughout history. In Canada, we use the decimal system, which we will look at in much more detail below. There is a lot of information available on the internet on historically important number systems. If you re interested, you might start with the site at http://mathforum.org/alejandre/numerals.html. Many of the number systems used by early civilizations had a distinct symbol for every quantity they might wish to express. As the size of their herds increased, such number systems either became increasingly awkward or less precise (for instance, by using a word meaning many for any quantity larger than some specific quantity). Positional Number Systems In a positional number system, the numerical significance of a symbol depends on where it appears in the representation of the number. Numbers are created by writing down a sequence of symbols (called digits) selected from a fixed set of symbols. The base or radix of the positional number system is the number of symbols in this fixed set. The radix point is a special symbol used to establish the actual position of each digit in the number. Sounds pretty complicated and weird? Read on. Example 1: The Decimal Number System. Our ordinary decimal number system is a positional number system. The base is 10, because we build up all numerical values using 10 basic symbols: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. In North America, it is common to use the period as the radix point (we call it the decimal point), but in some European countries, it is more common to use the comma as the radix point. Of course, you know how to work with decimal numbers, but you may not have thought about how their properties arise. Think about the counting process. Suppose we have a vast herd of goats to which we would like to attach a number. Before we start counting any of our goats, we would say that as far as we can say, we have 0 goats. Then one-by-one we start counting as our goats walk past in an orderly single-file line. a goat goes by we now have 1 goat another goat goes by we now have 2 goats another goat goes by we now have 3 goats this continues for a bit until we have counted 8 goats then another goat goes by, and so we have 9 goats now another goat comes, but we ve run out of symbols. Since we are working with a positional number system, what is done is we say we have 10 goats we ve gone once through our entire set of symbols (of which there are ten) and we about to start to go David W. Sabo (2014) Number Systems Page 1 of 50

through the set again. The number 10 is formed as a sequence of two of our basic symbols. now another goat comes we increment the rightmost digit of our count by 1, so we can say we have 11 goats. The means we have counted 1 ten of goats and 1 more. The 1 on the right is in position 2 and represents numbers of tens of goats. The 1 on the right is in the position of ones, and represents just numbers of individual goats. when we ve counted 19 goats (1 ten of goats and 9 individual goats), the next goat causes us to change our tally to 20 two tens of goats. This process could go on for a long time, but you can see how the basic rules of counting lead to the properties of the positional number system. When we need to increment position 1 but we are at the highest value symbol, what we do is increment the position to its left by 1, and write a 0. Then further counting can increment this digit by one at a time until we run out of symbols, at which point the digit to the left is again incremented by 1, and we write a 0. You know that in the end, decimal numbers look something like: position: 3 2 1 0-1 -2-3 digits in position 0 count as individuals or ones or 10 0 = 1 s. digits in position 1 count as tens or 10 1 = 10 s. digits in position 2 count as hundreds or 10 2 = 100 s. digits in position 3 count as thousands or 10 3 = 1000 s. Similarly, the pattern of positional value can extend to the right of the radix point (here, it is the decimal point of course) so that digits in position -1 count as tenths or as 10-1 s. digits in position -2 count as hundredths or as 10-2 s. digits in position -3 count as thousandths or as 10-3 s. Thus, a decimal number such as 4675.384 really represents the sum: 4 x 10 3 + 6 x 10 2 + 7 x 10 1 + 5 x 10 0 + 3 x 10-1 + 8 x 10-2 + 4 x 10-3. In this way, a very small set of symbols (only ten) can be used to construct representations of an essentially unlimited range of quantities. The quantity represented by each symbol is not known until we know where in the number that symbol appears. So the leftmost 4 in the example represents a quantity of four thousand things. The rightmost 4 represents a quantity of four thousandths of a thing. Same symbol, but different quantities. The properties of decimal numbers are very familiar to you. This means that you know the basic structure of a positional number system. It also means that you have a mental image of the rather abstract concepts introduced in our definition of a positional number system at the beginning of this document. Much of the remainder of this document deals with applying the same principles when a base different from 10 is being used, and in converting the representations of numbers from one base to another. So, from now on, we will write numbers in the following form: d4d3d2d1d0.d-1d-2d-3d-4 followed by a subscript, b, indicating the base. Here the dk are the digits of the number. Page 2 of 50 Number Systems David W. Sabo (2014)

Then, the actual quantity represented by the number is Thus Similarly, + d4 x b 4 + d3 x b 3 + d2 x b 2 + d1 x b 1 + d0 + d-1 x b -1 + d2 x b -2 + d3 x b -3 + d4b -4 + 423.5610 = 4 x 10 2 + 2 x 10 1 + 3 x 10 0 + 5 x 10-1 + 6 x 10-2 423.567 = 4 x 7 2 + 2 x 7 1 + 3 x 7 0 + 5 x 7-1 + 6 x 7-2 (You can use your calculator to determine that this last number is numerically equivalent to 212.83673469310, rounded to twelve significant figures.) Converting to Base 10 (Decimal) Numbers The last example just above illustrates the procedure for converting numbers expressed in some base, b, into the equivalent number in base 10 (the equivalent decimal form). You just apply the basic definition of a positional number system, and then use your calculator (which gives answers in the decimal number system) to get the decimal value you require. Here s one more example: 31203.1325 = 3 x 5 4 + 1 x 5 3 + 2 x 5 2 + 0 x 5 1 + 3 x 5 0 + 1 x 5-1 + 3 x 5-2 + 2 x 5-3 = 2056.33610 In this case, the conversion from the base 5 number to its base 10 equivalent gives an answer which can be stated exactly in seven significant digits. Often, conversions from some base to base 10 gives results which do not have terminating decimal parts, and illustrated by the base 7 example in the previous section above. Converting Numbers from Base 10 to Other Base Number Systems The process of converting decimal numbers into their equivalent in some other number base is more complicated than the reverse operation illustrated just above. If we restrict consideration to number bases which are whole numbers and greater than 1, then the following principles apply: the whole number part (to the left of the decimal point) converts to the whole number part (to the left of the radix point) of the result the fractional part (the part to the right of the decimal point) converts to the fractional part (the part to the right of the radix point) of the result The conversion of these two parts must be done separately, because they involve different procedures. i) conversion of the whole number part Suppose wish to convert the number 223310 to its equivalent in base 5. The procedure is as follows. Carry out successive divisions by 5, recording both the result and the remainder, until the result of 0 is obtained. Here the sequence of operations is: 2233 5 = 446 with remainder 3 446 5 = 89 with remainder 1 David W. Sabo (2014) Number Systems Page 3 of 50

89 5 = 17 with remainder 4 17 5 = 3 with remainder 2 3 5 = 0 with remainder 3 Since we ve reached the result of 0, we stop. Now, the digits of this quantity expressed in base 5 is just the remainders, listed in order from bottom to top; that is 223310 = 324135 (You can use the methods of the previous section to confirm that 324135 is indeed equivalent to 223310). People refer to this procedure of dividing and recording remainders as the modular arithmetic method. It s fairly easy to see how it works. The base 5 equivalent of 223310 will have the general form d4d3d2d1d0, where the digits have values to ensure that d0 + d1(5) + d2(5 2 ) + d3(5 3 ) + d4(5 4 ) + = 223310 Now, divide both sides by 5: 2 3 4 d0 d1 5 d2 5 d3 5 d4 5 2233 5 5 Every term except the first in the numerator on the left-hand side of this equation is a multiple of 5 because of the way positional numbers are constructed. Thus, we can rewrite this equation as d0 2 3 3 d1 d25d35 d45 446 5 5 It is easy to see here that when 2233 is divided by 5, the result will be 446 with a remainder of 3 (since the fraction on the extreme right is the remainder divided by 5). The fraction on the right must equal the fraction on the left, since the rest of the expression on the left is a whole number. Thus, dividing by 5 and determining the remainder has isolated the value of the rightmost digit, d0, of the base 5 form of the number. If we repeat the process with the whole number parts of each side, we get 2 3 d1 d2 5 d3 5 d4 5 d1 2 446 1 d2 d3 5d4 5 89 5 5 5 5 When 446 is divided by 5, the remainder is 1, and this must be the value of d1, the second digit of the base 5 version of the number. Thus, each time we divide by 5, the remainder of the result is the next digit of the base 5 version of the number, working from right to left. Example 2: Convert 43578210 to its equivalent in base 7. Apply the modular arithmetic method, with the divisor being 7 in this case. We get 435782 7 = 62254 with remainder 4 62254 7 = 8893 with remainder 3 8893 7 = 1270 with remainder 3 1270 7 = 181 with remainder 3 181 7 = 25 with remainder 6 25 7 = 3 with remainder 4 Page 4 of 50 Number Systems David W. Sabo (2014)

3 7 = 0 with remainder 3 Now, listing the remainders in reverse order (bottom to top in the list above) gives the desired answer: 43578210 = 34633347 We can verify this answer by checking that it s positional number system form does give the original decimal value: 4 + 3 x 7 + 3 x 7 2 + 3 x 7 3 + 6 x 7 4 + 4 x 7 5 + 3 x 7 6 = 435782. If you had a bit of trouble determining the remainders in these divisions by 7, note the following trick. If you use your calculator to do the first division in the list above, you get 435782 7 = 62254.57143 The fractional part of this number (to the right of the decimal point) must be the remainder divided by 7 (see the patterns of the calculations on the previous pages). Thus remainder 0.57143 so remainder 7 x 0.47143 = 4.0001 7 That is, the remainder in this step of division is 4. Since remainders are always whole numbers, you need to use only a few digits of the fractional part to get a result from which the value of the remainder is obvious. Example 3: Convert 42325 to its equivalent in base 3. If we had a calculator that could do arithmetic with base 5 numbers, we could simply apply the modular arithmetic method straight away to get our answer. However, most calculators only do arithmetic in base 10. So, the strategy here will be to first convert 42325 into its decimal equivalent, and then we can use the modular arithmetic method to convert that number into its base 3 form. So 42325 = 2 + 3 x 5 + 2 x 5 2 + 4 x 5 3 = 56710 Then 567 3 = 189 with remainder 0 189 3 = 63 with remainder 0 63 3 = 21 with remainder 0 21 3 = 7 with remainder 0 7 3 = 2 with remainder 1 2 3 = 0 with remainder 2 Thus, listing these remainders in order from top to bottom, we get 42325 = 56710 = 2100003. David W. Sabo (2014) Number Systems Page 5 of 50

Example 4: Convert 76455 to its equivalent in base 9. This seems like more of the same. If you just plow ahead and do the arithmetic, you may overlook the fact that there is something very seriously wrong with the purported number 76455. If this is a base 5 number, then its digits must be selected from the set of just five basic symbols, conventionally {0, 1, 2, 3, and 4}. Hence a base 5 number cannot contain the symbols 7, 6, or even 5. The conversion request cannot be done, because 76455 is not a valid base 5 number, and so cannot be converted into a valid number in any other base. ii) conversion of the fractional part Conversion of the fractional part involves successive multiplication by the new base, retaining the whole number parts of the results. For instance, to convert 0.777610 to its equivalent in base 5, we would do the following. 0.7776 x 5 = 3.888 keep the 3, use the 0.888 in the next stage 0.888 x 5 = 4.44 keep the 4, use the 0.44 in the next stage 0.44 x 5 = 2.2 keep the 2, use the 0.2 in the next stage 0.2 x 5 = 1.0 keep the 1, there is nothing left to use in the next stage. The process is complete when the successive multiplication by the new base results in a whole number, since there is no fractional part of the result to take to the next stage. Then, the digits of the number in the new base is just the retained digits (listed following the words keep the above) in the order in which they appeared. Thus, 0.777610 = 0.34215. How can we tell that this is correct? The easiest confirmation is by applying the properties of a positional number system: 0.34215 = 3 x 5-1 + 4 x 5-2 + 2 x 5-3 + 1 x 5-4 = 0.6 + 0.16 + 0.016 + 0.0016 = 0.777610 Example 5: Convert 0.13610 to base 7. We just follow the method explained above. 0.136 x 7 = 0.952 keep the 0, use the 0.952 in the next stage 0.952 x 7 = 6.664 keep the 6, use the 0.664 in the next stage 0.664 x 7 = 4.648 keep the 4, use the 0.648 in the next stage 0.648 x 7 = 4.536 keep the 4, use the 0.536 in the next stage 0.536 x 7 = 3.752 keep the 3, use the 0.752 in the next stage 0.752 x 7 = 5.264 keep the 5, use the 0.264 in the next stage 0.264 x 7 = 1.848 keep the 1, use the 0.848 in the next stage 0.848 x 7 = 5.936 keep the 5, use the 0.936 in the next stage The fractional part at each step does not seem to be getting any closer to zero over the long run, and this process is showing no signs of terminating. In fact, in all likelihood, it will not terminate. This would mean that the decimal fraction, 0.13610, does not have a terminating representation as a fraction in base 7. The best we could do in this case is just quote a certain number of digits as an approximation to the equivalent result: 0.13610 0.06443515 Page 6 of 50 Number Systems David W. Sabo (2014)

This example demonstrates that fractions which have a finite length representation in one base may not have a finite length representation in another base. In fact, the same thing would happen if you tried to convert the number 0.17 into its base 10 equivalent: 0.17 = 1 x 7-1 = 0.142857142857142857142857. endlessly repeating the six digit sequence 142857. So, when converting numbers from one base to another, a finite length whole number part always converts into a finite length whole number part, but the fractional parts may be finitely long in one system but not in another. This turns out to be the source of difficulties when floating point numbers are manipulated by computers (which work with numbers coded in a base 2 system, rather than a base 10 system). iii) conversion of decimal number which has both whole number and fractional parts In this case, as mentioned earlier, we do the whole number part using the modular arithmetic method, and the fractional part using the successive multiplication method, and put the two results together. Example 6: Convert 61473.52410 into its equivalent in base 8. Round the fractional part to six digits if necessary. We need to do the whole number part and the fractional part separately. For the whole number part, use the modular arithmetic method: 61473 8 = 7684 with remainder 1 7684 8 = 960 with remainder 4 960 8 = 120 with remainder 0 120 8 = 15 with remainder 0 15 8 = 1 with remainder 7 1 8 = 0 with remainder 1 Therefore 6147310 = 1700418 For the fractional part, we use the successive multiplication method, multiplying by 8 each time. To round the fractional part to six digits if necessary, we must calculate until the process stops or until we actually have seven digits to the right of the radix point. 0.524 x 8 = 4.192 keep the 4, use the 0.192 in the next stage 0.192 x 8 = 1.536 keep the 1, use the 0.536 in the next stage 0.536 x 8 = 4.288 keep the 4, use the 0.288 in the next stage 0.288 x 8 = 2.304 keep the 2, use the 0.304 in the next stage 0.304 x 8 = 2.432 keep the 2, use the 0.432 in the next stage 0.432 x 8 = 3.456 keep the 3, use the 0.456 in the next stage 0.456 x 8 = 3.648 keep the 3, use the 0.648 in the next stage This is now seven steps with no sign of termination. So far, we have 0.52410 = 0.4142233 8 Since the seventh digit is a 3, which is less than half of 8, we just drop the seventh digit and leave the sixth digit unchanged in order to round off to six places to the right of the radix point. Thus, our final answer here is David W. Sabo (2014) Number Systems Page 7 of 50

61473.52410 170041.4142238. As a check, we note that: 1700418 = 1 + 4 x 8 + 7 x 8 4 + 1 x 8 5 = 1 + 32 + 28672 + 32768 = 6147310 verifying our result for the whole number part. For the fractional part, we get 0.4142238 4 1 1 1 4 1 2 1 2 1 3 1 2 3 4 5 6 8 8 8 8 8 8 =0.52399826049804687510 You need to use a computer to get this many decimal places. Although this number is obviously not equal to 0.52410, you do get 0.52410, if you round it off to three decimal places, and so our result for the fractional part of the number is verified. The difference between this number and the desired value 0.52410, is the result of rounding the fractional part of the octal representation to six digits. Names of Number Systems Positional number systems are often given names that reflect the value of their base. The most commonly used names are: base = 2 base = 3 base = 4 base = 5 base = 8 base = 10 base = 16 binary number system ternary number system quaternary number system quinary or quintal number system octal number system decimal number system hexadecimal number system The list above reflects common usage. (These names do not follow a precise grammatical pattern, and you can find internet sites promoting the use of names such as octonary instead of octal or banal instead of binary. ) The most important numbers systems involved in computer systems technology are the binary, octal, decimal and hexadecimal systems. Most of the remainder of this document will deal with just these four systems. Positional Number Systems With Bases Larger than 10 To write numbers in a positional system with base n, we need n symbols to use as digits. When n is 10 or less, it is conventional to simply use the first n symbols from the set that we use for decimal numbers. Thus, base 3 numbers use the symbol set {0, 1, 2}, and base 7 numbers would use the symbol set {0, 1, 2, 3, 4, 5, 6}, and so on. For bases, n, which are bigger than 10, obviously we need additional symbols. The convention is to adopt alphabetic characters, in alphabetic order. Thus, for example, to write numbers in hexadecimal form, we need 16 symbols. The set of symbols conventionally used is Page 8 of 50 Number Systems David W. Sabo (2014)

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F} Thus, the symbol following 9 is A, the one following A is B, and so on. The ten usual numerical symbols along with the first six letters of the alphabet give us the sixteen symbols we need for expressing numbers in a base 16 number system. This means that A16 = 1010 B16 = 1110 C16 = 1210 D16 = 1310 E16 = 1410 F16 = 1510 If you needed to construct numbers with even larger bases, you would just continue to use further alphabetic characters: G, H, I, J, K, etc. Binary and Related Number Systems Having established some basic properties of positional number systems and methods for converting the representation of a number in one system into the equivalent representation in another number system, we can now turn our attention to numbers systems which are of greatest importance in computer technologies. Since digit electronic devices are capable of existing in one of two states, the most fundamental representations of data by computers makes use of the two symbols: {0, 1} a binary or base 2 system. Ultimately, all data representations stored electronically will have to be binary. However, binary representations are not particularly convenient for human beings, nor are humans particularly effective in working with binary numbers. This has led to the use of hexadecimal numbers as a convenient intermediate between the binary form used for digital data storage, and the decimal form we use in everyday communication. Historically, computer technologies have used the octal or base 8 system, and so we will look at that one briefly as well. Binary Numbers In a positional number system with base 2, the two symbols used for digits are conventionally 0 and 1. The positional values of the digits are powers of 2. The first few positions on either side of the radix point have the values: 128 64 32 16 8 4 2 1.5.25.125 Although this pattern is what we would call a base 2 positional number system using the characteristics of a positional number system as they have been described in previous pages of this document, the application of binary numbers in digital computer design will require some modifications to this basic method. In what follows, we will refer to the above pattern as ordinary unsigned binary. The unsigned part just means we are not making any allowance for the possibility of negative numbers at this stage. The word ordinary distinguishes this type of binary number from other patterns that have been developed to allow for arithmetic operations involving negative numbers. These other ways of rewriting decimal values in binary form will have their own special names. David W. Sabo (2014) Number Systems Page 9 of 50

Conversions from binary to decimal are particularly easy in the sense that a digit can only be 0 (in which case there is no contribution to the value from that position in the number) or 1 (in which case, the contribution is just the position value). Thus 11011001.10112 = 128 + 64 + 16 + 8 + 1 +.5 +.125 +.0625 = 217.687510 Of course, you can also use the modular arithmetic and/or successive multiplication method for conversion of decimal numbers to their binary equivalent. Example 7: Convert 165.812510 to ordinary unsigned binary form. We proceed here much like we did in the previous example 6, except that now the divisor and multiplier we use is 2 instead of 8. So, using modular arithmetic to convert the whole number part, we get So, we conclude that 165 2 = 83 with remainder 1 83 2 = 41 with remainder 1 41 2 = 20 with remainder 1 20 2 = 10 with remainder 0 10 2 = 5 with remainder 0 5 2 = 2 with remainder 1 2 2 = 1 with remainder 0 1 2 = 0 with remainder 1 16510 = 1010 01112 Convert the fractional part by successive multiplications by 2: 0.8125 x 2 = 1.625 keep the 1, use the 0.625 in the next stage 0.625 x 2 = 1.25 keep the 1, use the 0.25 in the next stage 0..25 x 2 = 0.5 keep the 0, use the 0.5 in the next stage 0.5 x 2 = 1.0 keep the 1 The residual fractional part is now zero, and so we re done with the conversion. The computations mean that 0.812510 = 0.11012, listing the kept digits in the order in which they arose. Thus, the final answer in this case is: 165.812510 = 1010 0111.11012 For decimal numbers with more that two or three digits, there is a more efficient approach with far less chance of error in hand calculations that makes use of hexadecimal numbers. Details are given ahead. Page 10 of 50 Number Systems David W. Sabo (2014)

Hexadecimal Numbers Hexadecimal numbers use a positional number system with a base of 16. The sixteen basic symbols forming the digits of hexadecimal numbers are: hexadecimal digit: 0 1 2 3 4 5 6 7 8 9 A B C D E F decimal value: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Although people have been known to use the lower case characters {a, b, c, d, e, f} for the last six symbols in this set, it is strongly recommended that you always write these in upper case form as shown in the list above. Conversion from hexadecimal to decimal form is done using the positional values of digits of the number system. Each position has a value different by a factor of 16 in this case compared to the adjacent digits. Example 8: Convert E57AB.C916 to its equivalent decimal number. The problem here is solved by simply applying the positional values of each hexadecimal digit. We ll do this in two steps: writing out the positional value expression as a first step, and then translating the hexadecimal digits as necessary in the second step: 1 1 E57 ABC. 9 B A16 7 16 5 16 E16 C 9 16 16 2 3 4 16 2 1 1 16 16 2 3 4 11 10 16 7 16 5 16 14 16 12 9 939947.78515625 10 2 Although you can t tell for sure using a calculator that displays just eight digits to the right of the decimal point, the conversion above is exact to eight decimal places. There has been no rounding. (In fact, you can demonstrate that a binary number with n digits to the right of the radix point will convert exactly to a decimal number with no more than n digits to the right of the decimal point. As you ll see below, this means that a hexadecimal number with n digits to the right of the radix point will convert exactly to a decimal number with no more than 4n digits to the right of the decimal point. The unavoidable need for truncation in conversion of fractional parts between decimal, binary, octal, and hexadecimal occurs in conversion of decimal numbers to one of these other three forms, and not when the conversion is the other way around.) Conversion of decimal numbers to hexadecimal form is done using the modular arithmetic/successive multiplication method as usual. Example 9: Convert 32573.710937510 to its equivalent hexadecimal number. For the whole number part we have 32573 16 = 2035 with remainder 13 = D16 2035 16 = 127 with remainder 3 127 16 = 7 with remainder 15 = F16 David W. Sabo (2014) Number Systems Page 11 of 50

7 16 = 0 with remainder 7 So, 3257310 = 7F3D16. For the fractional part, we have 0.7109375 x 16 = 11.375 keep the 11 = B16, use the 0.625 in the next stage 0.375 x 16 = 6.00 keep the 6 Thus 0.710937510 = 0.B616 Thus, finally, 32573.710937510 = 7F3D.B616. as the required answer. Octal Numbers Octal numbers use a positional number system with a base of 8. The eight basic symbols forming the digits of octal numbers are {0, 1, 2, 3, 4, 5, 6, 7}, each having the same numerical value as the identical decimal digit. Octal numbers are not used as extensively in computer applications any more as they once were, but you will encounter them occasionally in certain areas of application. To convert from octal to decimal forms of a number, just use the positional values of each digit in the usual way. To convert from decimal to octal, use the modular arithmetic and successive multiplication methods in the usual way. Example 10: Convert 23734.4628 into its decimal equivalent. 23734.462 4 3 8 7 8 3 8 2 8 1 1 1 4 6 2 8 8 8 2 3 4 8 2 3 = 10204.5976562510 Example 11: Convert 27531.41992187510 into its octal equivalent. First, the whole number part: So, 27531 8 = 3441 with remainder 3 3441 8 = 430 with remainder 1 430 8 = 53 with remainder 6 53 8 = 6 with remainder 5 6 8 = 0 with remainder 6 2753110 = 656138 Page 12 of 50 Number Systems David W. Sabo (2014)

For the fractional part, So, 0. 419921875 x 8= 3.359375 keep the 3, use the 0.359375 in the next stage 0. 359375 x 8= 2.875 keep the 2, use the 0.875 in the next stage 0.875 x 8 = 7.00 keep the 7 0. 41992187510 = 0.3278. So, the final answer is 27531.41992187510 = 65613.3278. Useful Relationships Between Binary, Octal, and Hexadecimal Numbers A binary digit can exist in one of two forms: 0 or 1. This means that two binary digits can exhibit four different patterns or states: either of two possible values in the second digit can be paired up with either of two possible values in the first digit, so that 2 x 2 = 4 different patterns of digits are possible. Extending this argument, three binary digits are capable of exhibiting a total of 8 different patterns each of the two possible values of the third digit can be paired up with any of the four possible patterns that the first two digits can have. In fact, if you think about it, the number of possible patterns of three zeros and ones is exactly equal to the number of rows in a truth table for an expression involving three logical variables. If we tabulate these 8 patterns, and then interpret them as binary numbers, we get the following decimal equivalents: 3-digit binary number decimal equivalent 000 0 001 1 010 2 011 3 100 4 101 5 110 6 111 7 With a three-digit binary number, we can represent the range of values that a single digit of an octal number can display. This means that each digit in an octal number is equivalent to three digits in the corresponding binary form of that number, and each triplet of digits in a binary number (counting from the radix point, either left or right this is important!) is equivalent to one digit in the corresponding octal form of that number. This gives us a quick way to convert from octal to binary forms of numbers (or the reverse). Before exploiting this property of octal and binary numbers, we take a look at an even more useful similar connection between binary and hexadecimal numbers. There are sixteen distinct symbols in the hexadecimal number system, which is exactly the number of distinct patterns of zeros and ones in a group of four binary digits. (Again, think of the number of rows in a truth table for an expression involving four logical variables.) It is easy to confirm the following equivalences: David W. Sabo (2014) Number Systems Page 13 of 50

4-digit binary number hexadecimal equivalent 0 0000 0 1 0001 1 2 0010 2 3 0011 3 4 0100 4 5 0101 5 6 0110 6 7 0111 7 8 1000 8 9 1001 9 10 1010 A 11 1011 B 12 1100 C 13 1101 D 14 1110 E 15 1111 F The small numbers along the left are just decimal numbers labeling the rows (as we did earlier in the course in labeling rows of 4-variable truth tables). This table demonstrates that each digit in the hexadecimal representation of a number corresponds to four digits in the binary representation of that number, and each group of four digits in the binary representation (counting fours starting at the radix point, either right or left) corresponds to one digit in the hexadecimal representation of that number. This provides a fast way to convert between binary and hexadecimal representations of a number. Example 12: Determine the octal and hexadecimal representation of the binary number 101100 01101010 10010100.11010011 0101102 Before this section, the strategy we would have to use to accomplish these tasks would be to first convert the binary number to decimal form (a fairly difficult task because of the number of digits in this binary number) and then we could use the modular arithmetic and successive multiplication methods to convert that decimal value to octal and to hexadecimal. In view of the special relationships just described between binary, octal and hexadecimal representations, this task can be done by grouping the binary digits appropriately and translating the groups directly into either octal or hexadecimal digits. To convert to the octal representation, we need to recopy the binary number very carefully and insert separators between each group of 3 digits, counting left and right from the radix point. Then below each group of three, we jot the octal digit equivalent: 1/011/00 0/110/101/0 10/010/100110/100/11 0/101/10 1 3 0 6 5 2 2 4 6 4 6 5 4 You can get these octal digits from the table just above, or just use the fact that in the three digit groupings, the place values are 1, 2, and 4 reading from right to left (eg. 1102 = 2 + 4 = 610 = 68). Notice that the values of triplets at the extreme right and extreme left are worked out as if zeros are added to the outside of them to complete the triplet. Thus, the single 1 on the left is evaluated as if it were a triplet 001, and the single 1 on the extreme right is evaluated as if it were the triplet 100. You know that appending zeros to either the right or left of a decimal number does not change its value, and the same is true for numbers in any base. Page 14 of 50 Number Systems David W. Sabo (2014)

So, we can now write: 101100 01101010 10010100.11010011 0101102 = 13065224.646548 From the number of digits involved in both numbers, you can easily see that this method is superior to first converting the binary number to decimal, and then the decimal number to octal. A similar procedure is used to obtain the hexadecimal equivalent, only now the original binary number is chopped up into groups of four digits counting from the radix point: 10/1100/ 0110/1010/ 1001/0100/.1101/0011/ 0101/10 2 12 6 10 9 4 13 3 5 8 2 C 6 A 9 4 D 3 5 8 In the first row of translations, we ve written down the decimal equivalent of each group of four digits. Then in the second row, we ve written down the corresponding hexadecimal digits. In this case, the partial groups of four digits on either end are completed by adding zeros to the outside as required. Thus the group 10 on the left end is expanded to 0010, equivalent to 216, and the group 10 on the right is expanded to 1000, equivalent to 816. We always add the extra zeros to the outside of the group. So, we can state 101100 01101010 10010100.11010011 0101102 = 2C6A94.D35816 There are two important implications of this special relationship that exists between binary, octal and hexadecimal numbers. (i) Digital computers work with binary numbers only, but human beings are very poorly equipped to work reliably with binary numbers because we have trouble parsing long lists of only two different symbols reliably. On the other hand, humans can deal with shorter lists of a larger number of possible symbols very well. (After all, even the dullest of us can handle words of one syllable strings of three or four symbols selected from a set of 26 different symbols.) So, in practice, because there is this simple equivalence between groups of binary digits and either octal or hexadecimal digits, it is common for low-level computer data to be displayed to humans in hexadecimal (and sometimes octal) form. You will see this when we talk about the ASCII code for storing text in computer memory. (ii) We can also use this relationship between binary and hexadecimal numbers to streamline conversions between decimal and binary numbers. Particularly when the decimal number involved has more than two or three digits, or the binary number has more than eight or ten digits, we strongly recommend that conversions between decimal and binary be done using a hexadecimal intermediate. If you do these conversions directly, the number of steps in the calculation will be so great that it is almost impossible to avoid making an error. Example 13: Convert 85081.17187510 to ordinary unsigned binary form. As usual, the whole number and fractional parts have to be converted using separate methods. If we do the modular arithmetic for the whole number part and the successive multiplication for the fractional part using the base 2, the steps look like: 85081 2 = 42540 with remainder 1 0.171875 x 2 = 0.34375 42540 2 = 21270 with remainder 0 0.34375 x 2 = 0.6875 David W. Sabo (2014) Number Systems Page 15 of 50

21270 2 = 10635 with remainder 0 0.6875 x 2 = 1.375 10635 2 = 5317 with remainder 1 0.375 x 2 = 0. 75 5317 2 = 2658 with remainder 1 0. 75 x 2 = 1. 5 2658 2 = 1329 with remainder 0 0. 5 x 2 = 1.0 1329 2 = 664 with remainder 1 664 2 = 332 with remainder 0 332 2 = 166 with remainder 0 166 2 = 83 with remainder 0 83 2 = 41 with remainder 1 41 2 = 20 with remainder 1 20 2 = 10 with remainder 0 10 2 = 5 with remainder 0 5 2 = 2 with remainder 1 2 2 = 1 with remainder 0 1 2 = 0 with remainder 1 So, writing the remainders from the modular arithmetic calculation in reverse order, and the whole number parts from the successive multiplication calculation in the order in which they appeared, we get that 85081.17187510 = 1 0100 1100 0101 1001.0010 112 If we had used base 16 as an intermediate in this conversion, the calculations would have gone as follows: 85081 16 = 5317 with remainder 9 0.171875 x 16 = 2. 75 5317 16 = 332 with remainder 5 0.75 x 16 = 12.0 332 16 = 20 with remainder 12 20 16 = 1 with remainder 4 1 16 = 0 with remainder 1 This means that 85081.17187510 = 14C59.2C16 using the facts that = 0001 0100 1100 0101 1001.0010 11002 116 = 00012, 416 = 01002, C16 = 11002, 516 = 01012, 916 = 10012, and 216 = 00102. We could even have done this conversion through an octal intermediate: 85081 8 = 10635 with remainder 1 0.171875 x 8 = 1. 375 10635 8 = 1329 with remainder 3 0.375 x 8 = 3.0 1329 8 = 166 with remainder 1 166 8 = 20 with remainder 6 20 8 = 2 with remainder 4 2 8 = 0 with remainder 2 Then, using the fact that 18 = 0012, 28 = 0102, 38 = 0112, 48 = 1002, and 68 = 1102, Page 16 of 50 Number Systems David W. Sabo (2014)

we get 85081.17187510 = 246131.138 = 010 100 110 001 011 001.001 0112 which is identical (except for leading and trailing zeros) to the previous two results. The point of this example is that the large amount of easily messed up detail present in the eventual binary number is packaged into more humanly manageable chunks by using either a hexadecimal or octal intermediate in the conversion from decimal to binary forms. Addition of Unsigned Binary Numbers In order to understand some of the issues we will address shortly in looking at methods for representing numbers in digital computers, you will need to be able carry out the addition of two unsigned binary numbers by hand. The process is a simple adaptation of the method we use routinely for adding decimal numbers. We would display the work involved in adding 65274 to 81782 something like: 11 carries 65274 81782 147056 We work from right to left here: first, add 4 to 2, write the result 6 in the rightmost position then, add 7 to 8. The result is 15, which is too big to write in one space. The 5 is written in the second column, and the 1 is carried over to the third column, written just above the 2 there. now, add the digits in the third column from the right: 1 + 2 + 7 = 10. The digit 0 is written down and the 1 is carried to the next column. now add the digits in the fourth column from the right: 1 + 5 + 1 = 7. This single digit is written down, there is nothing to carry to the next column. finally, add the digits in the last column on the left: 6 + 8 = 14. Both are written down since there are no more columns to carry into. The end result is the sum 147056. Whenever the sum of digits in a column exceeds 9, the excess (which is just the left-most digit of that column sum) must be carried to the next column leftwards. The same pattern works with unsigned binary numbers. In this case, there are really only five distinct situations we need to take into account: 1 0 0 1 1 1 0 1 0 1 1 0 1 1 10 11 Whenever the result of the addition is two digits, the leftmost digit is carried to the next column leftwards (and should be written down explicitly as carries in this course). Example 14: Add the following two unsigned binary numbers, showing all carries. David W. Sabo (2014) Number Systems Page 17 of 50

1011 0110 0011 + 1100 1011 0111 Confirm your end result by converting all numbers to decimal form and performing the addition. We ll display the addition in the traditional form as a whole, and explain each column in notes to follow: 1111 11 111 carries 1011 0110 0011 1100 1011 0111 1 1000 0001 1010 result 3 2109 8765 4321 column number The bottom row of italicized digits number the columns from right to left. The work done here is as follows: column 1 has 1 + 1 = 10 (binary). Write down the 0 and carry the 1 to the next column. column 2 now has 1 + 1 + 1 = 112. Write down the 1 on the right, and carry to 1 on the left to the next column. column 3 now has 1 + 0 + 1 = 102. Write down the 0 and carry the 1 to the next column. column 4 now has 1 + 0 + 0 = 12. Write down the 1. There is nothing to carry to the next column. column 5 has 0 + 1 = 12. Write down the 1. There is nothing to carry to the next column. column 6 has 1 + 1 = 102. Write down the 0 and carry the 1 to the next column. column 7 now has 1 + 1 + 0 = 102. Write down the 0 and carry the 1 to the next column. column 8 now has 1 + 0 + 1 = 102. Write down the 0 and carry the 1 to the next column. column 9 now has 1 + 1 + 0 = 102. Write down the 0 and carry the 1 to the next column. column 10 now has 1 + 1 + 0 = 102. Write down the 0 and carry the 1 to the next column. column 11 now has 1 + 0 + 1 = 102. Write down the 0 and carry the 1 to the next column. column 12 now has 1 + 1 + 1 = 112. Since this is the leftmost position of the numbers being added, write down both digits, completing the result. We need to confirm the result using decimal arithmetic. Converting the two original numbers to decimal via a hexadecimal intermediate gives: 1011 0110 0011 = B6316 = 11 x 16 2 + 6 x 16 + 3 = 291510 1100 1011 0111 = CB716 = 12 x 16 2 + 11 x 16 + 7 = 325510 and for the result of the binary addition 1 1000 0001 1010 = 181A16 = 1 x 16 3 + 8 x 16 2 + 1 x 16 + 10 = 617010 But, using a calculator, 291510 + 325510 = 617010, and so the binary result is confirmed. Page 18 of 50 Number Systems David W. Sabo (2014)

Methods of Representing Numbers in Digital Computing With this theory of number systems behind us, it is now time to address some practical issues as far as devising binary number representations for use in digital computing. Although it is easy to generalize conventions that we use for decimal numbers into written ordinary binary forms, these do not lend themselves to straightforward computer implementation. There are two issues we need to take account of: (i) the distinction between whole numbers (integers) and floating point numbers You have already seen one bothersome distinction between the results of converting whole numbers into binary form and of converting numbers with fractional parts into binary form namely that when a fractional part is present, it may not be possible (in fact, usually is not possible) to express the binary equivalent of the fractional part in a finite number of digits. This means that techniques for coding floating point numbers in a binary form will have to allow for truncation and hence the usual occurrence of inexact conversions. A second problem is that while whole numbers that arise in calculations are often the result of enumeration of actual objects and thus, do not usually have extremely large values, it is not uncommon to have floating point values which range over many, many decimal digit positions. For instance, typical time intervals for a microprocessor operation are of the order of nanoseconds, or 0.000000001 seconds. On the other hand, the number of such operations a computer performs in one second is of the order of, say, 3000000000. Since there are approximately 31557600 seconds in one year, then a typical pc performs about 94672800000000000 cycles per year. Any capability of manipulating floating point arithmetic by computer will have to be able to cope with values having such widely different orders of magnitude. (ii) distinguishing between positive and negative numbers In ordinary written decimal arithmetic, we distinguish between positive and negative values by appending a minus sign in front of numbers measuring negative values. However, in computer application, we only have two symbols available, 0 and 1, both of which are used for coding the actual numerical value. If we introduced a minus sign, as a distinct symbol, we would suddenly be working with a three-symbol system, and conventional computing equipment cannot handle that. So, we must come up with other ways of distinguishing between positive and negative values. The way we address both of these issues has implications for what will be required in developing algorithms for carrying out arithmetic operations on the binary numbers that result. When we work with decimal numbers on paper, whether whole numbers or numbers with fractional parts, it number of digits in the number is not usually an issue we have to think much about. If we encounter a larger number, we just use up more space on the paper. However, when numbers are stored in a binary form in computer memory, the number of binary digits, or bits, available for each number has to be fixed in advance. The computer hardware is not designed to be able to store different numbers with different numbers of binary digits, because the binary digits are being electronically coded into an array of actual physical memory devices. The principles we will describe below do not require a specific number of bits to be available for each number, but to illustrate them, we will have to assume a specific number of bits. For most of the examples to follow, we will work with 8-bit binary representations for whole numbers. In computer technology, 8 bits of memory is called 1 byte, which is the smallest directly accessible amount of memory on most computers. At the end we ll look very briefly at how easily the patterns established show up in larger numbers, such as 16-bit or 32-bit numbers. (You could also illustrate these binary representations using, say, 5 bit binary numbers or 11-bit binary numbers, or whatever. We ve chosen to use 8-bit binary representations as illustrations here because they are simple enough to prevent hand calculations from becoming really tedious; they are extensive enough to give some variety in the results you see; and there are quite a number of practical situations in which 8-bit binary numbers are used.) David W. Sabo (2014) Number Systems Page 19 of 50

We start by describing five commonly-used approaches to coding whole numbers, all but one of which takes sign into account. Computer Representation of Integers (i) (modulo 2 n ) unsigned binary numbers This format is what we ve called ordinary unsigned binary form so far. Numbers are coded in binary form using n bits. When n = 8, the range of values that can be represented is up to 0000 0000 = 010 1111 1111 = 25510 = 2 8 1 a total of 2 8 = 256 different values. (Obviously, in general, the range of decimal numbers represented in n-bit unsigned binary form is 010 to 2 n 1). Conversion from decimal to unsigned binary form is done using the modular arithmetic method, and conversion from unsigned binary to decimal using the positional values of the binary digits, as has been demonstrated earlier in this document. Example 15: Write the following decimal numbers in 8-bit unsigned binary form: 46, 182, -92, and 465. For 4610, we ll use the modular arithmetic method directly, since 46 is quite a small number: 46 2 = 23 with remainder 0 23 2 = 11 with remainder 1 11 2 = 5 with remainder 1 5 2 = 2 with remainder 1 2 2 = 1 with remainder 0 1 2 = 0 with remainder 1 Listing the remainders in reverse order, we get 4610 = 1011102 This result is just 6 digits long, so to get the 8-bit form of 4610, we need to pad on the left with two zero digits. Thus, 4610 = 0010 11102 in 8-bit unsigned binary form. Since 18210 is a somewhat larger value, we ll carry out the conversion to binary form through a hexadecimal intermediate: Thus, 182 16 = 11 with remainder 6 6 16 = 0 with remainder 11 = B16 18210 = B616 = 1011 01102 (where we have used the fact that B16 = 1110 = 10112 and 616 = 610 = 01102). Page 20 of 50 Number Systems David W. Sabo (2014)

For the remaining two values, we are out of luck. -9210 is a negative number, but the unsigned binary format is incapable of storing sign information. Hence, -9210 cannot be coded in unsigned binary form. Since the largest value expressible in an 8-bit unsigned binary number is 2 8-1 = 255, we are unable to code 46510 in 8-bit unsigned binary form. (ii) signed magnitude form A very obvious way to include sign information in the binary representation of a number is to use one of the available digits to distinguish between positive an negative values. In the signed magnitude form, the left-most digit is used for this purpose: 0 _ indicates a positive value 1 _ indicates a negative value. The remaining n 1 digits are then used to store the numerical value without sign (the magnitude) in unsigned binary form. Thus, for 8-bit signed magnitude numbers, the largest magnitude we can store is the unsigned binary value 111 11112, a string of seven 1 s, which is equivalent to 12710. So, in 8-bit signed magnitude form, we can store any integer between -12710 (= 1111 11112) and +12710 (= 0111 1111). An oddity of the signed magnitude form is that there are two distinct representations of the number 010: and 0000 00002 = +010 1000 00002 = -010 The signed magnitude form is easy to set up, but it turns out to be quite awkward to formulate basic arithmetic operations using the signed magnitude form. The basic idea of using a single bit to distinguish between positive and negative numbers is used in the standard methods of coding floating point numbers. Example 16: Express the decimal values 87, -63, 149, and -738 in 8-bit signed magnitude form. The technique is to start by converting the numerical part to binary form ignoring the sign. Then assemble the numerical part (padded with zeros on the left to give exactly seven digits) with the single bit at the left which indicates the sign. So, using the methods already illustrated several times, we can determine that 8710 = 101 01112 which is exactly 7 digits. So, since +8710 is positive, the result we need is 8-bit signed magnitude form 0 1010111 87 0101 0111 87 10 2 10 For the second number, we have 6310 = 11 11112, which is just six digits long. We need a seven digit numerical part, so we must append a zero to the left of these six digits. Finally, since -6310 is negative, the sign bit at the extreme left of the 8-bit representation is set to 1: David W. Sabo (2014) Number Systems Page 21 of 50

8-bit signed magnitude form 1 011 1111 63 1011 1111 63 10 2 10 Next, 14910 = 1001 01012. Since it apparently takes eight binary digits to represent the numerical part of 14910, it is not possible to represent this number in 8-bit signed magnitude form (which allows at most seven digits for the numerical part). It would require at least a 9-bit signed magnitude form to accommodate 14910. Similarly 73810 = 10 1110 00102 a total of ten binary digits. So, again, we conclude that -73810 cannot be accommodated in an 8- bit binary signed magnitude format. The conclusions in these last two examples are consistent with the general principle stated just above that decimal values either larger than 12710 or less than -12710 cannot be represented as 8-bit binary signed magnitude numbers. (iii) and (iv) the ones-complement and twos-complement forms We deal with these two forms together because they are closely related. First, we must define two operations, which take one binary number and turn it into another binary number. to do the ones complement operation on a binary number, just change all of the zero digits to ones, and all of the ones digits to zeros. to do the twos complement operation on a binary number, start by doing the ones complement operation and then add 1. Example 17: Do the ones and twos complement operation on the number 1010 0101. To do the ones complement operation, we just change each digit to its opposite: 1010 0101 0101 1010 this is the ones complement of 1010 0101. To get the twos complement of the original number, we just add 1 to its ones complement: 1010 0101 0101 1010 this is the ones complement of 1010 0101. +1 0101 1011 this is the twos complement of 1010 0101. When we are working with an unsigned binary number equal to k10, then the result of the ones complement operation is a binary number equivalent to (2 n k 1)10, and the result of the twos complement operation is a binary number equivalent to (2 n k)10. Thus, for an 8-bit binary number equivalent the k10, the ones complement would be equivalent to (255 k)10 and the twos complement would be equivalent to (256 k)10. We will show shortly that this property of the twos complement operation can be exploited as a basis for implementing addition and subtraction of integers very efficiently. Page 22 of 50 Number Systems David W. Sabo (2014)

A second property of these two operations is that they can be undone or reversed simply by repeating the operation. Thus, to undo a ones complement operation, just repeat the ones complement operation a second time. This will give back the original number. Similarly to undo a twos complement operation, just repeat the twos complement operation and you will get the original binary number back again. Example 18: Demonstrate this property with the numbers in Example 16. 1010 0101 0101 1010 1010 0101 Here each arrow represents doing the ones complement operation (changing 0 to 1 and 1 to 0). You can see by inspection that the binary number on the right is identical to the binary number on the left. Also 1010 0101 0101 1010 +1 0101 1011 1010 0100 +1 1010 0101 Here again, each right-pointing arrow indicates the twos complement operation has been done. You can see again by direct inspection that the binary number at the lower right, obtained by performing two twos complement operations in a row, is identical to the original binary number with which we started. Now, having defined the ones complement and twos complement operations, we are in a position to define two new ways for representing numbers in computer memory. The ones complement form used to represent a decimal number, k, in binary form is defined as follows: if k is a positive number, then its ones complement form is just its signed magnitude form if k is a negative number, then write down the signed magnitude form of the absolute value of k, and perform the ones complement operation on that binary number. The twos complement form used to represent a decimal number, k, in binary form is defined in exactly the same way as the ones complement form, except that the twos complement operation is used when k is a negative value. (Note that an alternative definition of these two forms can be given in terms of decimal values. To get the ones complement form of a decimal number, k: if k is a positive number, then its ones complement form is just its signed magnitude form if k is a negative number, then its ones complement form is obtained by subtracting the absolute value of k from 255, and converting the result to its unsigned binary form. For twos complement forms, the same process can be employed, except that for negative values of k, the absolute value of k should be subtracted from 256 before converting the result to its unsigned binary form. These recipes for forming ones complement and twos complement forms of decimal numbers give the same results. However, the decimal value -2 n-1 does have a valid n- bit twos complement binary representation which can be obtained using this decimal-based definition of the form, but cannot be obtained using the recipe based on the twos complement operation given earlier because 2 n-1 does not have a valid n-bit signed magnitude representation.) David W. Sabo (2014) Number Systems Page 23 of 50

From the recipes described above, it is easy to see that a very important property of the ones complement and twos complement forms is that the leftmost digit is always 0 when positive decimal numbers are being represented, and that leftmost digit is always 1 when a negative decimal value is being represented. Example 19: Determine the 8-bit binary ones complement and twos complement forms of the numbers 8710 and -9210. Confirm your binary answers by converting back to decimal. Since 8710 is a positive number, its 8-bit binary ones complement form and 8-bit binary twos complement form are both just its 8-bit binary signed magnitude form. Since 8710 = 5716 = 0101 01112 the 8-bit signed magnitude form of 8710 is 0101 0111. Hence the answer for both ones complement and twos complement forms of 8710 is 0101 0111. In the case of -9210, which is a negative value, we begin by writing down the signed magnitude form of +9210, which you can easily verify is 0101 1100. To get the ones complement form of -9210, we just do the ones complement operation on this binary number: signed ones absolue magnitude complement value form operation 92 92 0101 1100 1010 0011 10 10 To get the twos complement form of -9210, essentially we just have to add 1 to the ones complement form, so signed ones absolue magnitude complement value form operation 1 92 92 0101 1100 1010 0011 1010 0100 10 10 Thus, the 8-bit binary ones complement form of -9210 is 1010 0011 and the 8-bit binary twos complement form of -9210 is 1010 0100. Now, we confirm these four results by converting back to decimal from the binary. if we are told that 0101 0111 is the 8-bit binary ones complement form of a number, then, we know that the number is positive (because the first digit is a 0), and so this must just be the 8-bit signed magnitude form of the number. Its decimal value can be computed using the positional values of the rightmost 7 digits: 0101 01112 = 1 + 2 + 4 + 16 + 64 = 8710 confirming the original result that 0101 0111 is the 8-bit ones complement form of 8710. if we are told that 0101 0111 is the 8-bit binary twos complement form of a number, then, again, we know that the number is positive (because the first digit is a 0), and so this must just be the 8-bit signed magnitude form of the number. Its decimal value can again be computed using the positional values of the rightmost 7 digits: 0101 01112 = 1 + 2 + 4 + 16 + 64 = 8710 confirming the original result that 0101 0111 is the 8-bit twos complement form of 8710. if we are told that 1010 0011 is the 8-bit binary ones complement form of a number, then we know that the number is negative (since the first digit is a 1). Thus, 1010 0011 must Page 24 of 50 Number Systems David W. Sabo (2014)

have been obtained by doing the ones complement operation on the signed magnitude form of the absolute value of the number. So, first, undo the ones complement operation by repeating it again: 1010 0011 0101 1100 Now, this must be the signed magnitude form of the absolute value of the original number, we we can easily compute to give 0101 1100 = 4 + 8 + 16 + 64 = 9210. But this is the absolute value of the original negative number, so the original number must have been -9210, confirming our previous result that the 8-bit binary ones complement form of -9210 is in fact 1010 0011. if we are told that 1010 0100 is the 8-bit binary twos complement form of a number, then we know that the number is negative (since the first digit is a 1). Thus, 1010 0100 must have been obtained by doing the twos complement operation on the signed magnitude form of the absolute value of the number. So, first, undo the twos complement operation by repeating it again: ones complement operation 1 1010 0100 0101 1011 0101 1100 Now, this must be the signed magnitude form of the absolute value of the original number, we we can easily compute to give 0101 1100 = 4 + 8 + 16 + 64 = 9210. But this is the absolute value of the original negative number, so the original number must have been -9210, confirming our previous result that the 8-bit binary twos complement form of -9210 is in fact 1010 0100. This example has illustrated not only how to determine the ones complement and twos complement forms of both positive and negative decimal numbers, but also, how to start with the ones and twos complement forms and work backwards to the original decimal numbers. It is important that you master both of these skills. (v) Excess-2 n and Related Binary Forms The Excess methods use a slightly different approach to adapting unsigned binary numbers to represent both positive and negative values. They all involve two steps: add a fixed offset to the decimal value convert the result to an unsigned binary form If an n-bit unsigned binary form is to be used in the second step, then the offset used in the first step is typically 2 n-1 or a nearby value. For example, if an 8-bit unsigned binary form is to be used, the most common offsets are 128 (= 2 8-1 = 2 7 ) and the nearby value 127 (which is used in the IEEE-754 standard for coding floating point numbers something we ll deal with later in this document). Note that to convert a binary number obtained in this way back to the corresponding decimal number, you would just reverse the two steps given above: David W. Sabo (2014) Number Systems Page 25 of 50

first convert the binary number to a decimal number using the positional values of each binary digit in the usual way then, subtract the offset. Example 20: Convert the decimal numbers 8710 and -9210 to 8-bit binary excess-128 form. Then, convert the binary numbers back to decimal to confirm your results. First add 128 to each number, and then convert the results to unsigned binary form: unsigned add binary 128 form 10 10 2 87 87 128 215 1101 0111 unsigned add binary 128 form 10 10 2 92 92 128 36 0010 0100 The conversion of these two binary numbers back to decimal form just involves these two steps in reverse: and positional values subtract offset 1101 0111 12 4 16 64 128 215 215 128 87 2 10 10 positional values subtract offset 0010 0100 4 32 36 36 128 92 2 10 10 Example 21: Write out the 8-bit binary representation of 11410, -7910, 15610, and -14510 using each of the five binary forms described above. You can use this example as practice exercises. We will just list the answers. Form: 11410-7910 15610-14510 unsigned binary 0111 0010 n/a 1001 1100 n/a signed magnitude 0111 0010 1100 1111 n/a n/a ones complement 0111 0010 1011 0000 n/a n/a twos complement 0111 0010 1011 0001 n/a n/a excess-128 1111 0010 0011 0001 n/a n/a Here n/a means not available you cannot represent a negative value in an unsigned binary form, or the decimal value in question is outside of the range of values that can be represented in the given binary format in 8 digits. For example, as we saw earlier, the largest positive number that can be represented in 8-bit signed magnitude form is 0111 11112 = +12710, and so it is not possible to obtained a signed magnitude representation of 15610 in just 8 bits. You can also use the table of results above to practice converting from numbers expressed in each of these binary forms to the corresponding decimal values. Page 26 of 50 Number Systems David W. Sabo (2014)

Example 22: Determine the decimal number corresponding to each of 0101 10002 and 1011 01102 when each of the five binary forms described above are used. Because many find the logic of this sort of reverse conversion to be a little less intuitive, we will present a few more details of the actual calculations. For 0101 10002, we get the following: if this is coded in unsigned binary form, then its decimal value is obtained using the values of the eight positions: 8 + 16 + 64 = 8810. if this is coded in signed magnitude form, then we decode the numerical part by using the values of the right-most 7 positions (which still gives 8810 as for the unsigned binary form) and then noting that the 0 in the first digit indicates a positive sign, for a final result +8810. the first digit is 0, indicating that if either the ones complement or twos complement form is used, then they are coding a positive value, and hence are identical (by the definitions of these two forms) to the signed magnitude form. Thus, if either the ones complement form or twos complement form are being used, the original decimal value must have been +8810. Coding is excess-128 form involves first adding 128 to the original value and then coding the result in unsigned binary form. To undo this process to get the original decimal value, we undo these two steps (obviously in reverse order). So, decoding as an unsigned binary value gives 8810. This must be what we get after adding 128 to the original decimal number. Thus, the original decimal number is 88 128 = -4010. For 1011 01102, we get the following: using the positional values of the digits to decode this as an 8-bit unsigned binary form, we get 2 + 4 + 16 + 32 + 128 = 18210. if this number is coded in signed magnitude form, the first digit being a 1 means that the original decimal value was negative. The remaining seven digits represent the numerical value in positional notation: 2 + 4 + 16 + 32 = 54. Thus, the original decimal number must have been -5410. if this number the ones complement form of the original decimal value, then the 1 in the first position means the original decimal number was negative, and further, that this binary number was obtained by doing the ones complement operation on the signed magnitude form of its absolute value. So, undoing, we first undo the ones complement operation by repeating it: 1011 0110 0100 1001. This must be the signed magnitude form of the absolute value of the original number. Since 0100 1001 as a signed magnitude form is equal to 1 + 8 + 64 = 7310, and this is the absolute value of the original negative number, we know that the original number must have been -7310. if this number the twos complement form of the original decimal value, then the 1 in the first position means the original decimal number was negative, and further, that this binary number was obtained by doing the twos complement operation on the signed magnitude form of its absolute value. So, undoing, we first undo the twos complement operation by repeating it: 1011 0110 0100 1001 0100 1010. This must be the signed magnitude form of the absolute value of the original number. Since 0100 1010 as a signed magnitude form is equal to 2 + 8 + 64 = 7410, and this is the absolute value of the original negative number, we know that the original number must have been -7410. if this binary number is an excess-128 form, the procedure for calculating the original decimal value is exactly the same as illustrated in the first part of this example. Decode this as if it were an unsigned binary form: 1011 0110 18210 and then subtract 128, to get 5410. In summary, our results are: David W. Sabo (2014) Number Systems Page 27 of 50

0101 10002 1011 01102 unsigned binary form: 8810 18210 signed magnitude form: 8810-5410 ones complement form: 8810-7310 twos complement form: 8810-7410 excess-128 form: -4010 5410 Ranges of Decimal Values That Can Be Represented in the Binary Forms From the properties of each of the five binary forms described above, it is straightforward to determine the following ranges of decimal values that can be represented in each case. 8-bit n-bit min max min max unsigned binary 0 255 0 2 n - 1 signed magnitude -127 127 -(2 n-1 1) 2 n-1 1 ones complement form -127 127 -(2 n-1 1) 2 n-1 1 twos complement form -128 127 2 n-1 2 n-1 1 excess-k form -k 255-k -k 2 n 1 - k A Visualization of the Twos Complement Form Shortly, when we describe a method for implementing addition and subtraction digitally, the following visualization will help you understand why the method works and also, how we get some of the rules used to interpret the results. We ll assume we re working with 8-bit binary numbers here, but the ideas are easy to adapt to n- bit binary numbers in general. You now know that if no account is taken of sign, we can represent the decimal numbers 010 through 25510 in with 8 binary digits or 8 bits. Picture those 256 values as positions on a wheel, with 0 at the top: 255 254 253 0 1 2 3 130 129 128 127 126 So, a number k now corresponds to a position k steps clockwise from the zero position at the top of the wheel. Recall that the twos complement of k corresponds to the value 2 n k = 256 k Page 28 of 50 Number Systems David W. Sabo (2014)

here. Thus, the arc from k clockwise around the wheel and back up to position 0 corresponds to the twos complement of k: 0 k We can easily visualize the operation of adding one binary number to another (or one number in general to another) with reference to this wheel. To form the sum of, say, 17 and 15, we just start at position number 17, and then move clockwise 15 steps: 0 17 steps 32 steps in total 15 steps Now, the corresponding visualization of subtraction is reasonable obvious you could view the subtraction of the value k as moving k steps counterclockwise on this wheel. Thus, 17 15 = 2 amounts to starting at position 17, and moving 15 steps counterclockwise to end up at position 2: 0 2 However, there is another approach to subtracting 15 from 17 that makes use of the cyclic property of the wheel namely, move clockwise by an amount equal to the twos complement of 15: 256 15 = 241 steps. Then, the endpoint would be 17 + 241 = 258 steps clockwise from zero. But, this is one complete circuit of the wheel back to zero (256 steps) plus two more steps David W. Sabo (2014) Number Systems Page 29 of 50

you would end up at the position 2, which is the correct result of 17 15. On this number wheel, moving counterclockwise by k steps gets you to the same position as moving clockwise by a number of steps equal to the twos complement of k. It is this property we now exploit to understand how 8-bit addition and subtraction can be handled very easily as addition only, as long as: We always regard subtraction as equivalent to addition of the negative value to subtract k, add k. We always code numbers in twos complement binary form and interpret the results as the twos complement binary form of the answer. Not only does using the twos complement form allow us to take advantage of the property just illustrated with the number wheel diagram, but it also ensures that our arithmetic can involve both positive and negative numbers. Deciding to restrict ourselves the twos complement forms for representing numbers in the computer means that positions on the right-side half of the number wheel sketched above correspond to positive values (for 8-bit binary, these positions would correspond to 010 to 12710). Positions on the left-hand side of the wheel will correspond to negative values (-12810 at the bottom of the wheel up to -110 just adjacent to the top of the wheel), since the corresponding binary strings have the first digit equal to 1. In the following discussion, we will label the positions around the wheel decimal values corresponding to their unsigned binary form as well as decimal values corresponding to their twos complement form: 255 254 253 0 1 unsigned binary form 2 3-3 -2-1 0 1 twos complement form 2 3 130 129 128 127 126-126 -127-128 127 126 Now, consider three cases: (i) addition of a positive value m to a positive value n In this case, the result, n + m, is a positive value. With reference to the 8-bit binary wheel being used here, this operation amounts to starting at position n and move m steps clockwise, since the twos complements of both n and m will have binary forms that are equivalent to their unsigned binary form. If the result, n + m, is not greater than +127, the largest allowable value in 8-bit twos complement binary form, then both binary numbers we start with will have their first digit equal to zero, and the resulting sum will still have first digit equal to zero, indicating a positive result (in reference to the twos complement form) and everything is fine. The result is the twos complement form of the positive answer, n + m. However, suppose n + m is greater than 127. Then moving m steps from position n on the wheel will get us onto the left-side half of the wheel. Now the result has a 1 in the first digit which would indicate a negative value in twos complement form, but we know that the result has to be a positive value. In doing the binary addition, this 1 in the first digit has had to result from a carry Page 30 of 50 Number Systems David W. Sabo (2014)

from the second digit. The result is invalid, of course, because you cannot add two positive numbers and get a negative number. What has happened is that the result has exceeded the capacity of the 8-bits available. This event is called overflow. On the wheel diagram, overflow in this case amounts to starting at a position n on the right-side and moving clockwise m steps so that we actually end up across on the left-side of the wheel: 0 n steps -128 m steps Crossing the bottom of the wheel (moving into negative territory in the twos complement form world) amounts to moving into a region where the 8-bit binary numbers have a 1 in the first digit that is, a carry into the first digit has occurred. Notice that starting at position n on the positive side of the wheel, and moving m steps clockwise, where m is a valid 8-bit twos complement form for a positive value, can never get us back into positive territory on the wheel, because m can never be more than have the distance around the wheel. So overflow here always shows up as a carry into the first digit of the result, which then sticks (there is no carry out of that first digit to reduce it back to a zero). This observation will become useful in formulating a simple rule for detecting overflow. Thus, when adding two positive numbers using twos complement arithmetic, overflow is always characterized by a carry into the first digit, but no carry out of the first digit. (ii) add a negative value to a positive value: n + (-m) = n m. This amounts to starting out at position n and moving clockwise a distance 256 - m, to end up at position n + 256 - m. Here m stands for the absolute value of m the value obtained by ignoring the sign of m. Consider two subcases. First, n > m. The result should be positive. n is positive and so the first digit of its twos complement binary representation is a 0. We move 256 - m steps clockwise, and end up in positive territory on the wheel again, so the first digit of the result must also be a 0. However, the first digit of the twos complement representation of m must be a 1, because m is negative. So, in 8-bit arithmetic, for instance, what must happen here is + n = 0 _ -m = 1 _ result = 0 _ The only way for this to happen is if a 1 carries into the position of this first digit, so that 1 + 1 can leave a digit 0 in that position. But then there will be a carry out of that position. So, a valid result here requires that there both be a carry into the first digit and a carry out of this digit. The other possibility here is that n < m. In this case, the result n m will be a negative number, and so the first digit in its twos complement representation will be a 1. Thus what must occur is David W. Sabo (2014) Number Systems Page 31 of 50

+ n = 0 _ -m = 1 _ result = 1 _ This can only happen if there is no carry into the first digit (since 0 + 1 = 1), but it cannot happen if there is a carry into the first digit (since then we d have 1 + 1 = 10, which would leave a 0 as the first digit of the result, which is invalid). Of course, if there is no carry in to this digit, there will be no carry out. Thus a valid result occurs here if there is no carry into the first digit and of necessity, no carry out of that digit either. Finally, it is easy to see that n m can never give an overflow in twos complement arithmetic. The most negative result we can get occurs when n is the smallest possible positive number, and -m is the most negative possible value that can be coded in twos complement form. In 8 bits, these values are n = 0 and -m = -128. The difference, n m is -128, which can be represented in 8-bit twos complement form. The largest positive result we could get here occurs when n is as large as possible, and m is the least negative number possible. In 8-bit twos complement arithmetic, this situation has n = 127 and m = -1. Then the result would be n m = 126, which also has a valid 8-bit twos complement representation. (We ve ignored the situations of adding or subtracting zero, since that obviously cannot result in overflow.) (iii) adding a negative number, -m, to a negative number, -n: (-n) + (-m) = -n m. Now, both numbers have first digits equal to 1, and the result must also be negative, so its first digit must be 1. So, to get a valid result, we must have the following pattern occur: + -n = 1 _ -m = 1 _ result = 1 _ Now in binary 1 + 1 = 10, so if there is no carry into this first digit, our result will end up with a 0 in the first digit, apparently giving a positive result when two negative numbers are added. This type of error would occur when the two negative values added together give a result which is more negative than -12810. (On the wheel, this amounts to, in one sense, moving counterclockwise from the negative half of the wheel and ending up in the positive half.) Since we ve run off the end of the range of negative values it is possible to represent in 8-bit twos complement binary form, this again amounts to overflow. There is a carry out of the first digit, but no carry in. However, if we had had a carry into the first digit, then the binary sum in that position would be 1 + 1 + 1 = 112. This would leave a 1 in that first digit (and there d be a carry out of that position as well). The result would be the twos complement representation of a negative number, and so would be valid. Thus, a valid result in this case occurs when there is a carry into the first digit, but this means that there will also be a carry out of that digit. To save space here, we omit the interpretations of cases (ii) and (iii) using the wheel diagram but they are possible just as for case (i). Try it yourself to see if you have mastered the concepts here. Just as moving clockwise around the bottom of the wheel amounts to a carry into the first digit, so moving clockwise around the top of the wheel amounts to a carry out of the first digit. Combine that idea with the fact that adding - n (equivalent to subtracting n ) is visualized as moving clockwise 256 - n steps, and the observations made above from the binary arithmetic patterns will follow from the interpretation of motions around the wheel. Page 32 of 50 Number Systems David W. Sabo (2014)

carry out of position 1 0-128 carry into of position 1 So, with regard to overflow, we can summarize the possible events as follows: case no overflow overflow (i) positive + positive no carry into first digit and no carry out of first digit carry into first digit but no carry out of first digit (ii) positive + negative no carry into first digit and no carry out of first digit or carry into first digit and carry out of first digit impossible (iii) negative + negative carry into first digit and a carry out of the first digit no carry into the first digit, but a carry out of the first digit In summary, it would appear that the easiest way to determine whether or not overflow has occurred when adding the twos complement binary representations of decimal numbers is the following: if the number of carries into the first digit is equal to the number of carries out of the first digit, then no overflow has occurred, and the result is the twos complement representation of the correct answer if the number of carries into the first digit does not match the number of carries out of the first digit, then overflow has occurred. The result has no meaningful connection with the correct answer. We will now illustrate addition and subtraction using 8-bit twos complement binary representations by working through several examples. David W. Sabo (2014) Number Systems Page 33 of 50

Addition/Subtraction Using the Twos Complement Binary Form As mentioned before, we start off with two rules: all decimal numbers are coded in twos complement binary form. subtraction is always written as the addition of a negative number This approach to signed whole number (or integer) arithmetic in binary solves at least two problems: (i) We don t need to develop a separate digital logic for subtraction as such there is just one operation involved, addition. (ii) The logic of subtraction is, in fact, much more complex than the logic of addition. Simplistically, addition can be done one digit at a time, with, at worst, a single carry into the next position left. In subtraction, it may be necessary to look ahead many positions in order to set up a borrow. We can t take the space here to examine this problem in detail, if for no other reason than that it only serves to demonstrate why the approach we will be avoiding is very tricky. However, you ll get an inkling of the difficulties we have in mind when you compare the amount of work it takes to do the following decimal subtraction by hand: 500001-34322 rather than the addition 500001 + 34322 (Of course, it is easy for you to do both the addition and the subtraction with a calculator, but the calculator has to be designed to do this by hand calculation it its own way. Calculators don t have calculators of their own to do the hard stuff.) Now, we ll illustrate the process with several 8-bit binary arithmetic examples, and then with a couple of examples to show how easily this approach is adapted to binary numbers with more than eight digits. Example 23: Do k = 2110 + 8210 in 8-bit twos complement binary form. Verify your result using decimal arithmetic. The values 2110 and 8210 are both positive, so their twos complement binary forms are the same as their signed magnitude binary forms. Writing these down and carrying out the addition, we get: 1 carries 21 10 0001 0101 82 10 0101 0010 103 10 0110 0111 sum There was no carry into the first digit and no carry out of the first digit, so no overflow has occurred. The answer, 0110 01112, is in twos complement binary form. Since the first digit is Page 34 of 50 Number Systems David W. Sabo (2014)

zero, the answer is a positive value, and so this binary number is also the signed magnitude form of the answer, k. Thus k = +110 01112 = +(1 + 2 + 4 + 32 + 64)10 = 10310 which agrees with the sum obtained earlier using decimal arithmetic. Example 24: Do 6910 + 7910 in 8-bit twos complement binary form. Verify your result using decimal arithmetic. Again, the values 6910 and 7910 are both positive, so their twos complement binary forms are the same as their signed magnitude binary forms. Writing these down and carrying out the addition, we get: 1 1 111 carries 69 10 0100 0101 79 10 0100 1111 148 10 1001 0100 sum Now we see that there has been a carry into the first digit, but no carry out. Therefore, overflow has occurred and the apparent result is invalid. This sum cannot be done in 8-bit twos complement arithmetic. Since the answer is to be interpreted assuming it is in twos complement form, the result above would indicate a negative answer for the addition of two positive numbers because the first digit is a 1. Two positive numbers cannot add up to a negative value, and so we have a contradiction. If we do decimal addition, we get 6910 + 7910 = 14810, which we know is out of range for 8-bit twos complement form. Of course, the computer doesn t know this, but it is designed to detect overflow using the carry-in/ carry-out rule. If you interpreted the answer as if it was in unsigned binary form, you do get 14810. But things would get very confusing if you expected the computer to sometimes interpret a binary number one way, and sometimes another way. Notice that Example 27 below will give exactly the same binary answer, but there it would be a serious error to interpret the binary answer as if it was an unsigned binary form. So, we stick with the rule that the numbers to be added are represented in twos complement form, and the result is interpreted as if it was in twos complement form. Then, if an overflow occurs (detected using the carry in/carry out rule), we must ignore the answer as invalid. The only way to add numbers that cause overflow in 8-bit binary representations is to do the arithmetic using binary forms with more digits (or bits). This is why most programming languages make a variety of integer data types available. (For instance, in C/C++, Pascal, Java, etc., there are 8-bit, 16-bit, 32-bit, and 64-bit integer types, among others.) Example 25: Do k = 9410-7010 in 8-bit twos complement binary form. Verify your result using decimal arithmetic. The first number, 9410, is positive, so its twos complement form is just the same as its signed magnitude form: 0101 11102. Since -7010 is negative, we start with the signed magnitude form of 70 = -70 : David W. Sabo (2014) Number Systems Page 35 of 50

70 signed magnitude 0100 0110 ones complement 1011 1001 operation add 1 +1 twos complement 1011 1010 8-bit twos complement form of -7010. So 1111 11 carries 94 10 0101 1110-70 10 1011 1010 +24 10 1 0001 1000 sum We have had a carry into the first digit and a carry out, so there has been now overflow. The 8-bit binary number, 0001 10002, represents a positive value in twos complement form, and so can be decoded to decimal as if it was in signed magnitude form: 0001 10002 +(001 10002) = +(810 + 1610) = +2410 which agrees with the sum obtained above doing the addition using decimal arithmetic. Example 26: Do k = 1810-8310 in 8-bit twos complement binary form. Verify your result using decimal arithmetic. The 8-bit binary twos complement form of 1810 is its 8-bit binary signed magnitude form, 0001 0010, since it is a positive number. To get the 8-bit binary twos complement form of -8310, we need to start with the signed magnitude form of 8310, the absolute value of -8310, which is 0101 0011. Then, do the twos complement operation on this binary number, to get 1010 1101. So, our calculation becomes carries 18 10 0001 0010-83 10 1010 1101-65 10 1011 1111 sum There is no carry into the first digit, and no carry out, so no overflow has occurred. (In fact, there were no carries at all in the addition step.) The result in 8-bit twos complement form has its first digit equal to 1, so it must represent a negative value. Converting to decimal, So 1011 1111 1011 1111 twos complement 0100 0001 operation decode as -6510 twos complement form decode as 110 + 6410 = 6510. signed magnitude But, with our calculator, we can easily confirm that 1810 8310 = -6510, and so our binary calculation has been verified. Page 36 of 50 Number Systems David W. Sabo (2014)

Example 27: Do k = -2710-8110 in 8-bit twos complement binary form. Verify your result using decimal arithmetic. We can be fairly brief here in setting up this solution, because by now you ve seen quite a few examples of converting decimal numbers to 8-bit binary twos complement form. So and signed magnitude 2710 form 0001 1011 therefore -2710 twos complement 1110 0101 form signed magnitude 8110 form 0101 0001 therefore -8110 twos complement 1010 1111 form Thus, 11 1 111 carries -27 10 1110 0101-81 10 1010 1111-108 10 1 1001 0100 sum There was a carry into the first digit and a carry out, so no overflow has occurred. The answer in 8-bit twos complement form is 1001 0100, indicating a negative value. Converting to decimal to get the value of k gives 1001 0100 twos complement 0110 1100 operation decode as +(410 + 810 + 3210 + 6410) = 10810. signed magnitude Thus, if k = 10810 and k is negative, we must have that the result is k = -10810 which agrees with the result we obtained using our calculator to do the arithmetic in decimal. Example 28: Do k = -5810-9710 in 8-bit twos complement binary form. Verify your result using decimal arithmetic. In outline, and signed magnitude 5810 form signed magnitude 9710 form 0011 1010 therefore -5810 0110 0001 therefore -9710 twos complement 1100 0110 form twos complement 1001 1111 form So David W. Sabo (2014) Number Systems Page 37 of 50

11 11 carries -58 10 1100 0110-97 10 1001 1111-155 10 1 0110 0101 sum There is a carry out of the first digit, but no carry into the first digit. Therefore overflow has occurred. The result cannot be expressed in 8-bit binary twos complement form. For what it s worth, we ll take a brief look at how the operations in Examples 23 through 28 correspond to movements on the wheel diagrams used earlier to illustrate the twos complement form. Recall that on these wheels, addition is viewed as a clockwise movement by a number of positions or steps equal to the number being added. In the following diagrams, the numbers on the inside of the wheels are absolute positions (range of 0 to 255 you could also consider these to be unsigned binary equivalents) and the numbers on the outside of the wheels are the decimal values that result when you interpret the position labels as being in twos complement form. In Example 23, we did 21 + 82. The result would be to take 82 steps clockwise from the position representing +2110, which is itself located 21 steps clockwise from 0 at the top of the wheel. The end point of the 82 step movement is reached before crossing either the bottom of the wheel (a carry into the first digit of the answer) or the top of the wheel (a carry out of the first digit of the answer). Thus, in the arithmetic, there is neither a carry into nor a carry out of the first digit of the answer, and so no overflow occurred. In the picture, this corresponds to not moving from one side of the wheel to the other. The result is a valid 8-bit twos complement form. 21 steps 0 21 128 103 82 steps The situation in Example 24, where we did 69 + 79, has some similarities. The value 69 shows up on the wheel 69 steps clockwise from 0 at the top. Then, adding 79 to this value amounts to moving 79 steps further around the wheel clockwise. You can see that that takes us around the bottom of the wheel (a carry into the first digit occurs) but not far enough to come back around over the top of the wheel (not far enough for a carry out of the first digit). This gives an invalid result, because we have added two positive values and ended up on the negative side of the wheel. In fact, since positive values correspond to a number of steps which are less than half of the wheel, it is impossible to add two positive values and the resulting movement is far enough to make it over the top and back into positive territory once the negative (left) side of the wheel has been entered. -108 148 0 128 0 24 69 69 In Example 25, we did 94 70. On the wheel, this correspond to starting at a position 94 steps clockwise from the top, and then moving clockwise around the wheel by a distance of 256 70 = 186 steps, the number of steps equal to the 8-bit twos complement of -70. In this case, this second movement clockwise around the wheel not only crosses the bottom of the wheel (a carry into the first digit of 128 94 Page 38 of 50 Number Systems David W. Sabo (2014)

the answer occurs) but continues on to cross over the top of the wheel (a carry out of the first digit of the answer occurs) and back into positive territory. You can see from this picture that if the negative number has a smaller absolute value than the positive number, the second movement will always be enough to cross both the bottom of the wheel and the top of the wheel (which means that in such a situation we will always get a carry out to match the carry into the first digit of the answer. The easiest way to see how the operation in Example 26 corresponds to movements on the wheel is to view the sum done there as adding +18 to -83. So we start by noting the location of -83 on the wheel (move clockwise -65 from 0 by 256 83 = 173 steps, the number of steps 18 steps equal to the twos complement of -83). Then, to add 18, -83 just move another 18 steps clockwise. This second movement does not cross either the top of the wheel or the bottom of the wheel (so there is neither a carry into the first digit nor a carry out of the first digit of the binary answer) and so there is no overflow. The binary answer is a valid result. 191 173 0 128 Things get a bit more complicated with the last two examples because both numbers involved are negative, and so the movements and positions on the wheel are twos complement values. For example 27, the twos complement of -27 is 256 27 = 229, and the twos complement of -81 is 256 81 = 175. Thus, we start at a position 229 steps clockwise from 0 (representing the decimal value -2710), and then move another 175 steps clockwise around the wheel to get our result. You can see that this second movement crosses both the top of the wheel and the bottom of the wheel (corresponding to both a carry out and a carry into the first digit of the result). No overflow results, and the binary answer is valid. 229 steps -27-108 229 148 0 128 For example 28, we have that the twos complement of -58 is 198 and the twos complement of -97 is 159. So, we can view the operation -58 + (-97) as starting out at a position 198 steps clockwise from 0, and then moving an additional 159 steps clockwise. In this case, the second movement crosses the top of the wheel, but is not long enough to get back around the bottom of the wheel. Thus, we ve had a carry out of the first digit, but no carry in, and so overflow has occurred and the binary answer is invalid. -58 198 0 128 101 In summary, we see the following from these diagrams: overflow can occur only when adding a positive value to a positive value or a negative value to a negative value in adding two positive values, overflow occurs because the second movement crosses the bottom of the wheel (carry into the first digit of the answer) into the left half of the wheel where the twos complement form represents negative values, but is not long enough to cross back over the top of the wheel to where the twos complement forms represent positive values. in adding two negative values, overflow occurs because the second movement crosses the top of the wheel (carry out of the first digit of the answer) but doesn t make it far enough around to cross the bottom of the wheel (so that there is a matching carry into the first digit of the answer). David W. Sabo (2014) Number Systems Page 39 of 50

We finish this section with two examples demonstrating 16-bit binary twos complement arithmetic. Example 29: Form the sum 776710 + 1398210 using 16-bit twos complement binary form. Verify your result using decimal arithmetic. Both numbers here are positive, and so their twos complement form is identical to their signed magnitude form. We will first convert each of the decimal numbers to a 16-bit unsigned binary form, and then expropriate the first digit on the left as the sign bit, to get the signed magnitude form. We ll use a hexadecimal intermediate form to save arithmetic here. So, and 7767 16 = 485 with remainder 7 485 16 = 30 with remainder 5 30 16 = 1 with remainder 14 1 16 = 0 with remainder 1 13982 16 = 873 with remainder 14 873 16 = 54 with remainder 9 54 16 = 3 with remainder 6 3 16 = 0 with remainder 3 Thus, and 776710 = 1E5716 = 0001 1110 0101 01112 1398210 = 369E16 = 0011 0110 1001 11102. The underlined digit in both cases is needed as the sign digit. Since it is zero in both binary numbers, we are in range for 16-bit signed magnitude forms. Thus, the 16-bit signed magnitude form (and hence, the 16-bit twos complement forms) of each number will be precisely the binary numbers shown above. So, our sum can be done immediately: 111 11 11 11 carries 7767 10 0001 1110 0101 0111 13982 10 0011 0110 1001 1110 0101 0100 1111 0101 sum There is no carry into the first digit and no carry out, so there is no overflow here. The binary result corresponds to a positive value because the first digit is a 0. Decoding it (twos complement form is identical to signed magnitude form which is identical to unsigned binary form for positive values in range), we get 0101 0100 1111 01012 = 54F516 = 5 + 15 x 16 1 + 4 x 16 2 + 5 x 16 3 = 2123710. But, by calculator, 776710 + 1398210 = 2123710, so our binary result is confirmed. Page 40 of 50 Number Systems David W. Sabo (2014)

Example 30: Form the difference 776710-1398210 using 16-bit twos complement binary form. Verify your result using decimal arithmetic. Following the previously declared general principle, we will do this subtraction as if it were the addition of a negative number: 776710-1398210 = 776710 + (-1398210). We have the 16-bit twos complement form of 776710 from the previous example. However, we need to calculate the 16-bit twos complement form of -1398210. To do this, we just need to two the twos complement operation on the 16-bit signed magnitude form of +1398210, available from the previous example: 0011 0110 1001 1110 16-bit signed magnitude form of +1398210 1100 1001 0110 0001 do the ones complement operation +1 add 1 1100 1001 0110 0010 the 16-bit twos complement form of -1398210 So, now, set up the addition 11 1 11 carries 7767 10 0001 1110 0101 0111-13982 10 1100 1001 0110 0010 1110 0111 1011 1001 sum The first digit of the result is 1, indicating that the result is a negative value. To convert this binary result back to decimal form, we thus need to do the twos complement operation, and then interpret the result as a signed magnitude number: 1110 0111 1011 1001 0001 1000 0100 0111 = 184716 = 7 + 4 x 16 1 + 8 x 16 2 + 1 x 16 3 = 621510. This is the absolute value of the negative answer, so the binary answer is equivalent to -621510, which is exactly the result we get if we do the decimal arithmetic with our calculator. We won t take the space to do any more 16-bit examples here. However, you should set up an example for yourself that you know will result in overflow (say 850010 + 1250010) to see that the carry in/carry out rule still holds for 16-bit twos complement arithmetic just like it did for 8-bit twos complement arithmetic. Floating Point Numbers and IEEE Floating Point Formats So far, we ve covered methods which allow the representation of positive and negative whole numbers in a variety of binary formats, some more useful or convenient than others in any specific application. One could think of at least a few approaches to accommodating numbers with fractional or decimal parts, but to be useful, such representations must be able to cope with the fact that floating point numbers commonly encountered cover a vast range of values indeed, the appearance of a physical measurement can change drastically by simply changing the units of measurement employed, even though the underlying numerical content is unchanged. For instance, if you print this document as it is presently formatted, then each line of text will occupy approximately 0.353 cm of the page vertically. We could equally well express this line David W. Sabo (2014) Number Systems Page 41 of 50

height as 0.00000353 km, or as 35300000 angstroms. All three numbers represent the same actual physical quantity, yet the actual three digits in the original form span numerical positions from 8 places to the left of the decimal point to 8 positions to the right of the decimal point between the three representations. Obviously, we need a representation of floating point numbers which accommodates such scaling in an efficient manner. A second problem that has to be taken into account (or at least recognized) is that there are decimal fractions (in fact, the vast majority of decimal fractions) do not have binary equivalents that can be expressed in a finite number of digits. For instance, when we try to convert 0.110 to its binary equivalent (using the method illustrated in Example 13 earlier), We get 0.1 x 16 = 1.6 (retain the 1, move the 0.6 to the next step) 0.6 x 16 = 9.6 (retain the 9, and move the 0.6 to the next step) 0.6 x 16 = 9.6 (retain the 9, and move the 0.6 to the next step) 0.6 x 16 = 9.6 (retain the 9, and move the 0.6 to the next step) 0.6 x 16 = 9.6 (retain the 9, and move the 0.6 to the next step) We don t really have to go much further here you can see that no matter how many times we do the multiplication, the result will be that an 0.6 gets carried to the next line, and so this process will never end. Thus 0.110 = 0.19999999..16 = 0.0001 1001 1001 1001 1001 1001..2 with the digit 9 repeating forever in the hexadecimal version, and the string of digits, 1001 repeating forever in the binary version. Thus, if we do not have an infinite amount of memory (and even with the drastic drop in computer memory prices in recent years, none of us do), we will not be able to store an exact equivalent of 0.110 in computer memory based on the binary number system. In reality, we are forced to simply truncate these representations at some stage, with the result that the binary value actually stored will not quite be numerically equivalent to 0.110. In this course, we present two ways of representing floating point values: (i) the IEEE floating point standards deal well with the scaling issue raised earlier, but gives only limited flexibility in handling the precision issue (though quite adequate for most applications), and (ii) Binary Coded Decimal (BCD) formats which allow exact representation of any floating point decimal number, but with limited capability in coping with numbers of vastly different magnitude. In this section, we present the IEEE floating point representation briefly, and in the next section, some details of BCD representations. The IEEE representations all begin with converting the magnitude of the decimal floating point number to a scientific notation-like binary form. To write a decimal number in scientific notation, we rewrite the original number as the product of a number greater than or equal to 1, but less than 10, and a power of 10. Thus, and 586.73 5.8673 x 10 2 (= significand x power of 10) 0.0003124 3.124 x 10-4. (= significand x power of 10) The significand always has one significant digit to the left of the decimal point. If the power of 10 is positive, the original value is larger than the significand by a factor of 10 raised to that power in effect, a positive exponent on the 10 indicates how many places right the decimal point must be moved to obtain the original number. A negative power of 10 indicates how many places left in the significand that the decimal point must be moved to obtain the original number. Page 42 of 50 Number Systems David W. Sabo (2014)

The same sort of notation can be used for floating point binary numbers, except now, instead of powers of 10 indicating where the radix point should be located, it is powers of 2. To illustrate, consider the number conversions already done in Example 13 earlier. There we found that 85081.17187510 = 1 0100 1100 0101 1001.0010 112 So, just as we can write 85081.17187510 = 8.5081171875 x 10 4, so, also, we can write 1 0100 1100 0101 1001.0010 112 = 1.0100 1100 0101 1001 0010 112 x 2 16 since, to get from the significand on the right-hand side to the binary number on the left-hand side to, we have to shift the radix point sixteen positions rightwards. Of course, in other instances where converting the significand to the original binary number requires that the radix point be moved leftwards, the exponent in the power of 2 would be a negative number. The so-called IEEE-754 floating point standard defines several representations, of which the two most common are the single precision format, and the double precision format. The single precision format occupies 32 bits (4 bytes) of memory, used as follows (starting on the left, or with the so-called most significant bit): 1 bit to represent the sign of the number (0 for positive, 1 for negative) 8 bits to code the exponent of the 2, using an excess-127 representation 23 bit to code the significand (omitting the first digit see below) The double precision format occupies 64 bits (8 bytes) of memory, used as follows: 1 bit to represent the sign of the number (0 for positive, 1 for negative) 11 bits to code the exponent of the 2, using an excess-1023 representation 52 bits to code the significand (omitting the first digit see below) To see how this works in detail, consider the decimal number 85081.17187510 again. We ve already seen above that 85081.17187510 = 1.0100 1100 0101 1001 0010 112 x 2 16 This number is positive (so the initial sign bit will be 0), and the power of 2 in the binary scientific form is 16, which when coded in excess-127 form gives 16 16+127 = 143 1000 1111 Finally, the binary digits in the significand are shown above. However, in one final twist of efficiency, note that the first digit of the significand will always be a 1. Recognizing this, that first digit is never stored in the floating point representation. By omitting this so-called hidden bit from the actual binary representation of the floating point number, we are actually able to store the equivalent of 24 bits of information in 23 physical bits (in the single precision format, for instance). Thus, the IEEE-754 Single Precision representation of 85081.17187510 would be: David W. Sabo (2014) Number Systems Page 43 of 50

01000111 10100110 00101100 10010110 sign exponent significand For -85081.17187510, everything would stay the same except for the first bit on the left which would change to a 1 from the 0. Notice that the significand of 85081.17187510 required just 22 bits, and so the rightmost 23-bit section of the representation was padded out on the right with a single zero. Had the significand required more than 23 bits, those beyond the 23 rd would have simply been dropped. This example is one of those rare cases where the IEEE-754 Single Precision representation is an exact representation. Some additional parts or properties of the IEEE-754 standard that you should be aware of are: by definition, the number zero is coded with all bits equal to zero. (It isn t really necessary to attach a sign to zero, nor to attach an exponent of 2.) the effectively 24 binary digits of the significand in the single precision standard correspond to approximately 7 decimal digits, and the effectively 54 binary digits of the double precision standard correspond to approximately 16 decimal digits. Thus, the single and double precision representations amount to rounding all floating point operations off to roughly 7 and 16, respectively, significant decimal digits. Most applications that make use of floating point calculations use the double precision standard, but the effects of this rounding are readily observed. They show up at cells in MS Excel containing values like 1.63E-17, for example, where you may have expected a zero to show up. Generally, this degree of imprecision is not a problem, but there are applications and situations in which this floating point round-off error can be of very serious concern. The minimum values allowed for the binary exponent are -126 and -1022, respectively for the single and double precision representations. The maximum values allowed for these exponents are, respectively 127 and 1023. This means that the range of decimal magnitudes that can be accommodated by the single precision representation is about 1.2 x 10-38 to 1.8 x 10 +38 ( and 2.2 x 10-308 to 9.0 x 10 +307 for the double precision representation). When computations result in smaller values than these minimums, the result is replaced by an exact zero. When computations result in values larger than these maximums, a floating point overflow exception is raised. the use of the excess-127 representation to code the exponent in 8-bits with the stated limits does not make use of the most extreme values that can be coded in this way. These are available for coding special situations (NaN s for example). Arithmetic with floating point numbers coded in the IEEE-754 forms is very complicated. Page 44 of 50 Number Systems David W. Sabo (2014)

The following material on the Tens Complement and BCD codes is currently under construction and quite imcomplete. I m putting this up to be whatever value it can be at this time in the course. Check back every day or two for more material. When I m done with the topic, I ll remove this textbox. The Tens Complement Just before describing BCD codes below, it is useful to introduce the notion of the tens complement of a decimal number. You will recall that when we discussed the twos-complement earlier, it was necessary to distinguish between the twos complement form (which was a recipe for representing a decimal value in binary form) and the twos complement operation (which was an operation we could do on a binary number to get another binary number). In the case of the tens complement, all numbers involved are essentially decimal numbers (base 10), and so it isn t necessary to distinguish between the operation and the form when thinking about a tens complement. There are two ways to get the tens complement of a decimal number, x: if x has k digits to the left of the decimal point, then the tens complement of x is obtained by subtracting x from 10 r for an exponent r which is at least equal to k. subtract every digit of x from 9, and then add 1 to the rightmost digit of the result. You may pad this result by 9 s on the left as well. Example 31: Write down the tens complement of x = 586.932. Using the first method described above, we note first that x has three digits to the left of the decimal point. Thus, an acceptable tens complement of x can be obtained by 10 3 586.932 = 1000 586.932 = 413.068 The smallest power of 10 that could be used here had to be 3 because 586.932 has three digits to the left of the decimal point. We could have used a higher power of 10, which would have resulted in additional digits of 9 s on the left of the result. Using the second method, we do the following 999.999-586.932 413.067 + 1 413.068 giving the same answer as with the first method. Again, we can append digits of 9 on the left if the application requires it. Like the twos-complement operation discussed earlier in this document, if you repeat the tens complement operation on decimal values twice, you get back to the original number. Thus, the David W. Sabo (2014) Number Systems Page 45 of 50

tens complement of 586.932 was found to be 413.068. If you now compute the tens complement of 413.068 you get 586.932. So, as with the twos complement, undoing the tens complement just means repeating the operation again. The tens complement form will be applied shortly in the so-called BCD coding of negative values. However, as a quick indication of the sort of use to which we can put the tens complement, consider the following example. Suppose we wish to subtract 387 from 612. This is quite a difficult simple arithmetic problem because, although the result will be a positive number, all digits except the leftmost digit of the number being subtracted are larger than the corresponding digit of the number being subtracted from, and so a procedure of borrowing must be implemented. (We could make this simple problem even trickier by subtracting 387 from 602 try this second one by hand to see why.) The way you ve been taught to do 612-387 way back in elementary school days is as follows. Start from the right. Since 7 is bigger than 2, we cannot subtract 7 from 2, and so we must borrow 10 from the next digit left, giving something like: 60 12-38 7 5 for the first digit of the difference. Now we move to the next digit left. Since 8 is greater than 0, this subtraction requires a borrowing of a relative 10 from the next digit left, sort of like 5 10 12-3 8 7 2 5 In this work so far, we ve shown some single decimal digits as a double symbol because of the restrictions of text display in a document like this. Of course, the 10 and 12 doublets in the top line just above are not really valid decimal digits. Finally, we move one more digit left, and now, since the 3 is less than the 5, the subtraction can be done, giving the final result: 5 10 12-3 8 7 2 2 5 That is, 612 387 = 225. Now, consider the following approach. The tens complement form of 387 is 613 (since when we subtract each digit of 387 from 999, we get 612, and adding 1 on the right then gives 613.) Now look what happens when we add 613 to 612: 612 +613 (the tens complement of 387) 1225 Page 46 of 50 Number Systems David W. Sabo (2014)

Ignoring the 1 digit on the extreme left (which we can consider as a carry-out from adding the last two digits of the numbers on the left), what we have is 225, previously found to be the result of subtracting 387 from 612. In other words, in decimal arithmetic, we can implement subtraction as adding the tens complement of a number. Subtracting 387 from 612 amounts to adding the tens complement of 387 to 612. This approach eliminates the need to do any of the complicated borrowing so often necessary in subtraction. There are some details that must be sorted out of before this tens complement approach to decimal subtraction can be used routinely we ll do that below when we explain how to do decimal arithmetic using BCD codes. BCD Codes We look briefly at one other approach to coding decimal values in a binary form sometimes called simply decimal codes, or BCD codes (for binary coded decimal ). Quite a variety of BCD codes have been proposed and various types are useful in various sorts of applications. The main motivation behind these forms is to provide a way of handling decimal values with fractional parts with no rounding error in the conversion from decimal to binary. We need ten distinct bit patterns to be able to code strings of decimal digits in a binary form. Using three bits at a time won t work, since that gives us just 8 distinct bit patterns. However, with four bits, we have sixteen distinct bit patterns, which is more than enough to do the job. Thus, the various BCD codes consist of ways of assigning four-bit patterns to the ten decimal digits. (There are nearly 30 billion ways in which this can be done fortunately only a half-dozen or so of these possibilities have been implemented in any serious way.) The original BCD code simply used the unsigned binary representations of the ten decimal digits, padded with four zero bits on the left to give an 8-bit representation of each digit in the decimal number. The padding to 8-bits meant that each digit occupied an entire byte, which facilitated accessing the numbers one digit at a time, though being rather inefficient in memory usage. Thus, for instance, in the early days of computers, the BCD representation of the decimal number 36.710 would have occupied (at least) three bytes of memory with 00000011 00000110. 00000111 (I ve cheated a bit here by writing in the decimal point you will be able to see how BCD coding works without dealing with the issue of how to represent the decimal point.) As processors became more adept at manipulating parts of bytes, the four-bit padding on the left of each digit was dropped, so that each digit now occupied just four bits or a half-byte of memory. Thus 36.710 0000 0011 0110. 01112 The padding by four zeros on the left here to form a two-byte string is only partly for that purpose. In order to formulate rules for addition and subtraction using BCD forms, it is generally necessary to carry at least one extra digit on the left. Initially, this more efficient method was called packed BCD, to distinguish it from the older, more expansive form, but now the term BCD is often considered to mean packed BCD. Note that every decimal digit is represented exactly by a four bit string in the BCD form. There is no fractional conversion that could result in inexact arithmetic, and hence, in round-off error. The price to be paid for this feature is that the amount of memory used by a BCD form of a decimal number depends on the number of decimal digits in that number, and the memory that is used for the digits is not used very efficiently. David W. Sabo (2014) Number Systems Page 47 of 50

This ordinary BCD coding is an example of a BCD code in which the four binary digits used to code a decimal digit actually represent a positional value. In fact, within each group of four binary digits, the digits have the usual binary number system position values, with the first digit being ones, the second digit twos, the third digit fours, and the fourth digit eights. A BCD code based on such a position-value system of digits is called a weighted code, and of all possible BCD code systems, the most useful ones tend to be such weighted codes. So, finally, to distinguish this traditional BCD code from all other possible weighted BCD codes, people often call the form illustrated above the 8421-BCD code. While we are ignoring the issue of how the position of the decimal point is specified, we cannot dodge the issue of the sign of the number if we wish to use BCD to do actual arithmetic. Most commonly, the rule for using the 8421-BCD code to handle both positive and negative numbers is the following: for positive numbers, just write out the 8421-BCD code, using four binary digits to represent each decimal digit. Pad the final result with a zero decimal digit on the left if necessary, so that the leftmost decimal digit is less than 5. for a negative numbers k, the 8421-BCD code is the tens complement of the absolute value of k, written in 8421-BCD code. When the tens complement of k is formed, it must be padded with the digit 9 on the left if necessary to ensure that the leftmost digit is bigger than 4. With this recipe, it will turn out that any 8421-BCD coded value that has a leftmost digit smaller than 5 will represent a positive value. If the leftmost digit is 5 or greater, the 8421-BCD number represents a negative number in tens complement form. Example 32: Represent (a) 286.543 (b) 58352.89 (c) -354.6, and (d) -8750 in 8421-BCD form. (a) 286.543 This number is positive, so the first rule above applies. Each decimal digit is represented by a 4- bit string: 0010 1000 0110. 0101 0100 0011 ( 2 8 6 5 4 3 ) Since the leftmost digit is less than 5, we do not need to pad on the left with zero decimal digits, though we could add zeros to the left without changing this value. (b) 58352.89 This number is positive, so again the first rule above applies. Initially we get 0101 1000 0011 0101 0010. 1000 1001 ( 5 8 3 5 2 8 9 ) Page 48 of 50 Number Systems David W. Sabo (2014)

However, the BCD number we end up with has a left-most digit greater than 4, but positive numbers cannot be represented by a BCD code with leftmost digit greater than 4. So, we need to append at least one zero-digit on the left to get at least 0000 0101 1000 0011 0101 0010. 1000 1001 ( 0 5 8 3 5 2 8 9 ) as the 8421-BCIT form for 58352.89. (c) -354.6 This is a negative value, and so the second rule above applies. We start by computing the tens complement of -354.6 = 354.6. You can use either of the two methods described in the previous section of this document to get 645.4. This value is now coded as an 8421-BCD value: 0110 0100 0101. 0100 ( 6 4 5 4 ) Since the leftmost digit is greater than 4, this is an acceptable BCD representation of the negative number -354.6. You can pad 9 s (or 1001 s in binary) if that would be useful. (d) -8750 This is also a negative value, and so the same sort of procedure as illustrated in part (c) just above must be used. The tens complement of 8750 is 1250 (make sure you can get this result using both methods for finding the tens complement of a decimal value). But if we write out just these four digits in 8421-BCD form, the leftmost digit will be less than 5. According to the rules given above, this means we must pad with at least one 9 digit on the left, so that the leftmost digit is greater than 4. This gives us 1001 0001 0010 0101 0000 ( 9 1 2 5 0 ) as the minimal necessary representation of -875010 in 8421-BCD form. Again, additional 9 s could be appended on the left if that turned out to be useful or necessary. Example 33: Each of the following are 8421-BCD tens complement forms of signed numbers. Determine what ordinary decimal value each represents. (a) Addition and Subtraction with BCD-Coded Binary Numbers In this course, we consider only addition and subtraction of BCD coded numbers. Then Rule 1 is this: There is no subtraction. Subtraction is always handled as the addition of a negative number. Rule 2 is this: Signed decimal values are represented in the tens complement form described and illustrated in the previous section of this document. David W. Sabo (2014) Number Systems Page 49 of 50

Rule 3 is this: The result of the addition in binary is the tens complement form of the answer to the problem. So, basically everything is addition. There is just one more detail to take care of, and that is the matter of handling carries. (We will assume that overflow can never occur for the moment, because we aren t limiting our numbers to a fixed number of digits as we did when considering twos-complement binary arithmetic before.) There are apparently three techniques used to facilitate the add-with-carries problem in actual implementations of BCD arithmetic. Page 50 of 50 Number Systems David W. Sabo (2014)