Computer Architecture Basics



Similar documents
Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Instruction Set Architecture (ISA)

İSTANBUL AYDIN UNIVERSITY

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

MICROPROCESSOR AND MICROCOMPUTER BASICS

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

CPU Organization and Assembly Language

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Chapter 2 Logic Gates and Introduction to Computer Architecture

LSN 2 Computer Processors

Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

CSCI 4717 Computer Architecture. Function. Data Storage. Data Processing. Data movement to a peripheral. Data Movement

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

CHAPTER 4 MARIE: An Introduction to a Simple Computer

Administrative Issues

Central Processing Unit (CPU)

a storage location directly on the CPU, used for temporary storage of small amounts of data during processing.

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

CISC, RISC, and DSP Microprocessors

CHAPTER 7: The CPU and Memory

This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers

Chapter 5 Instructor's Manual

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

Computer Organization and Architecture

Chapter 7D The Java Virtual Machine

Instruction Set Design

Operating System Overview. Otto J. Anshus

1. Give the 16 bit signed (twos complement) representation of the following decimal numbers, and convert to hexadecimal:

Processor Architectures

Binary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit

ASSEMBLY PROGRAMMING ON A VIRTUAL COMPUTER

Chapter 1 Computer System Overview

Divide: Paper & Pencil. Computer Architecture ALU Design : Division and Floating Point. Divide algorithm. DIVIDE HARDWARE Version 1

An Introduction to the ARM 7 Architecture

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

1 Classical Universal Computer 3

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)

Numbering Systems. InThisAppendix...

Introducción. Diseño de sistemas digitales.1

To convert an arbitrary power of 2 into its English equivalent, remember the rules of exponential arithmetic:

Computer Science 281 Binary and Hexadecimal Review

(Refer Slide Time: 00:01:16 min)

Instruction Set Architecture (ISA) Design. Classification Categories

Management Challenge. Managing Hardware Assets. Central Processing Unit. What is a Computer System?

MACHINE INSTRUCTIONS AND PROGRAMS

Computer Architectures

ELE 356 Computer Engineering II. Section 1 Foundations Class 6 Architecture

Traditional IBM Mainframe Operating Principles

Introduction to Microprocessors

Microprocessor and Microcontroller Architecture

CSC 2405: Computer Systems II

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc

1 The Java Virtual Machine

Intel 8086 architecture

COMPUTER SCIENCE AND ENGINEERING - Microprocessor Systems - Mitchell Aaron Thornton

Lecture 2. Binary and Hexadecimal Numbers

Computer Architecture. Secure communication and encryption.

ECE 0142 Computer Organization. Lecture 3 Floating Point Representations

CS101 Lecture 26: Low Level Programming. John Magee 30 July 2013 Some material copyright Jones and Bartlett. Overview/Questions

The Central Processing Unit:

Microprocessor & Assembly Language

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Chapter 01: Introduction. Lesson 02 Evolution of Computers Part 2 First generation Computers

Let s put together a Manual Processor

Memory Systems. Static Random Access Memory (SRAM) Cell

Design Cycle for Microprocessors

Exception and Interrupt Handling in ARM

The string of digits in the binary number system represents the quantity

The programming language C. sws1 1

1. Convert the following base 10 numbers into 8-bit 2 s complement notation 0, -1, -12

COMPUTER ORGANIZATION AND ARCHITECTURE. Slides Courtesy of Carl Hamacher, Computer Organization, Fifth edition,mcgrawhill

picojava TM : A Hardware Implementation of the Java Virtual Machine

TYPES OF COMPUTERS AND THEIR PARTS MULTIPLE CHOICE QUESTIONS

Binary Numbers. Binary Octal Hexadecimal

MICROPROCESSOR. Exclusive for IACE Students iacehyd.blogspot.in Ph: /422 Page 1

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design


CPU Organisation and Operation

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Topics. Introduction. Java History CS 146. Introduction to Programming and Algorithms Module 1. Module Objectives

Computer Systems Design and Architecture by V. Heuring and H. Jordan

MACHINE ARCHITECTURE & LANGUAGE

Generations of the computer. processors.

PROBLEMS (Cap. 4 - Istruzioni macchina)

Levels of Programming Languages. Gerald Penn CSC 324

on an system with an infinite number of processors. Calculate the speedup of

An Introduction to Computer Science and Computer Organization Comp 150 Fall 2008

Instruction Set Architecture

Pentium vs. Power PC Computer Architecture and PCI Bus Interface

Chapter 6. Inside the System Unit. What You Will Learn... Computers Are Your Future. What You Will Learn... Describing Hardware Performance

AC : A PROCESSOR DESIGN PROJECT FOR A FIRST COURSE IN COMPUTER ORGANIZATION

Number Representation

IA-64 Application Developer s Architecture Guide

(Refer Slide Time: 02:39)

12. Introduction to Virtual Machines

Exceptions in MIPS. know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine

Chapter 3: Operating-System Structures. System Components Operating System Services System Calls System Programs System Structure Virtual Machines

Monday January 19th 2015 Title: "Transmathematics - a survey of recent results on division by zero" Facilitator: TheNumberNullity / James Anderson, UK

Transcription:

Computer Architecture Basics CIS 450 Computer Organization and Architecture Copyright c 2002 Tim Bower The interface between a computer s hardware and its software is its architecture The architecture is described by what the computer s instructions do, and how they are specified Understanding how it all works requires knowledge of the structure of a computer and its assembly language What is a computer? There are lots of machines in our world, but only some of those machines qualify as being a computer What features makes a machine a computer? The very first machines which bore the label of a computer were designed using electro-mechanical switches These switches were large The computers designed from them were more like automated adding machines than today s computers A program written for these early machines was entered into the computer by setting an array of relays to be either an electrical short or open circuit This was often accomplished with the aid of a panel of plug-in contact points and cables After setting the relays, the program could be executed To execute a new program, the cables needed to be moved to form a new network of relays With the invention of the vacuum tube in the 1940s, faster computers could be designed which could also run more complicated programs The real genesis of modern computers, however, came with the practice of storing a program in memory The possibility to storing much larger programs in memory became reality with the invention of ferrite core memory in the 1950s According to mathematician John von Neumann, for a machine to be a computer it must have the following: 1 Addressable memory that holds both instructions and data 2 An arithmetic logic unit 3 A program counter Put another way, it must be programmable A computer executes the following simple loop for each program pc= 0; do { instruction = memory[pc++]; decode( instruction ); fetch( operands ); execute; store( results ); } while( instruction!= halt ); Note: Instructions are the verbs and operands are the objects of this process In some architectures, such as the SPARC, the program counter is advanced by a set amount after each instruction is read In the Intel x86, however, the size of the instruction varies So as the instruction is read and decoded, the amount which the program counter should be advanced is also determined The important computer architecture components from von Neumann s stored program control computer are: CPU ALU Central processing unit The engine of the computer that executes programs Arithmetic logic unit This is the part of the CPU that executes individual instructions involving data (operands) 1

ALU Data (memory) registers IR Instructions PC CPU Computer Architecture Proposed by John von Neumann Register A memory location in the CPU which holds a fixed amount of data Registers of most current systems hold 32 bits or 4 bytes of data PC IR Acc Program counter, also called the instruction pointer, is a register which holds the memory address of the next instruction to be executed Instruction register A register which holds the current instruction being executed Accumulator A register designated to hold the result of an operation performed by the ALU Register File A collection of several registers Fundamental Computer Architectures Here we describe the most common Computer Architectures, all of which use stored program control The Stack Machine A stack machine implements a stack with registers The operands of the ALU are always the top two registers of the stack and the result from the ALU is stored in the top register of the stack Examples of the stack machine include Hewlett Packard RPN calculators and the Java Virtual Machine (JVM) The advantage of a stack machine is it can shorten the length of instructions since operands are implicit This was important when memory was expensive (20-30 years ago) Now, in Java, it is important since we want to ship executables (class files) over the network The Accumulator Machine An accumulator machine has a special register, called an accumulator, whose contents are combined with another operand as input to the ALU, with the result of the operation replacing the contents of the accumulator 2

Who is John von Neumann? John Louis von Neumann was born 28 December 1903 in Budapest, Hungary and Died 8 February 1957 in Washington DC He was a brilliant mathematician, synthesizer, and promoter of the stored program concept, whose logical design of the Institute for Advanced Studies (IAS) computer became the prototype of most of its successors - the von Neumann Architecture Von Neumann was a child prodigy, born into a banking family in Budapest, Hungary When only six years old he could divide eight-digit numbers in his head At a time of political unrest in central Europe, he was invited to visit Princeton University in 1930, and when the Institute for Advanced Studies was founded there in 1933, he was appointed to be one of the original six Professors of Mathematics, a position which he retained for the remainder of his life By the latter years of World War II von Neumann was playing the part of an executive management consultant, serving on several national committees, applying his amazing ability to rapidly see through problems to their solutions Through this means he was also a conduit between groups of scientists who were otherwise shielded from each other by the requirements of secrecy He brought together the needs of the Los Alamos National Laboratory (and the Manhattan Project) with the capabilities of the engineers at the Moore School of Electrical Engineering who were building the ENIAC, and later built his own computer called the IAS machine Several supercomputers were built by National Laboratories as copies of his machine Following the war, von Neumann concentrated on the development of the IAS computer and its copies around the world His work with the Los Alamos group continued and he continued to develop the synergism between computer capabilities and the needs for computational solutions to nuclear problems related to the hydrogen bomb His insights into the organization of machines led to the infrastructure which is now known as the von Neumann Architecture However, von Neumann s ideas were not along those lines originally; he recognized the need for parallelism in computers but equally well recognized the problems of construction and hence settled for a sequential system of implementation Through the report entitled First Draft of a Report on the EDVAC [1945], authored solely by von Neumann, the basic elements of the stored program concept were introduced to the industry In the 1950 s von Neumann was employed as a consultant to IBM to review proposed and ongoing advanced technology projects One day a week, von Neumann held court with IBM On one of these occasions in 1954 he was confronted with the FORTRAN concept John Backus remembered von Neumann being unimpressed with the concept of high level languages and compilers Donald Gillies, one of von Neumann s students at Princeton, and later a faculty member at the University of Illinois, recalled in the mid-1970 s that the graduates students were being used to hand assemble programs into binary for their early machine (probably the IAS machine) He took time out to build an assembler, but when von Neumann found out about it he was very angry, saying (paraphrased), It is a waste of a valuable scientific computing instrument to use it to do clerical work Source: http://eicsvtedu/ history/vonneumannhtml 3

ALU Data (memory) ALU Data (memory) ACC Stack IR Instructions IR Instructions PC PC CPU Stack Machine Architecture CPU Accumulator Machine Architecture ALU Data (memory) Register File IR Instructions PC CPU Load/Store Machine Architecture 4

Example Machine Instructions y = y + 10; y &y [y ] *y = *&y = y Stack Machine push [y ] push 10 add pop y Accumulator Machine load [y ] add 10 store y Load/Store Machine load r0, [y ] load r1, 10 add r0, r1, r2 store r2, y accumulator = accumulator [op] operand; In fact, many machines have more than one accumulator Pentium: 1, 2, 4, or 6 (depending on how you count) MC68000: 16 In order to add two numbers in memory, 1 place one of the numbers into the accumulator (load operand) 2 execute the add instruction 3 store the contents of the accumulator back into memory (store operand) The Load/Store Machine Registers: provide faster access but are expensive Memory: provides slower access but is less expensive A small amount of high speed memory (expensive), called a register file, is provided for frequently accessed variables and a much larger slower memory (less expensive) is provided for the rest of the program and data (SPARC: 32 registers at any one time) This is based on the principle of locality at a given time, a program typically accesses a small number of variables much more frequently than others The machine loads and stores the registers from memory The arithmetic and logic instructions operate with registers, not main memory, for the location of operands Since the machine addresses only a small number of registers, the instruction field to refer to a register (operand) is short; therefore, these machines frequently have instructions with three operands: add src1, src2, dest Machine Instructions Machine instructions are classified into the following three categories: 1 data transfer operations (memory register, register register) 2 arithmetic logic operations (add, sub, and, or, xor, shift, etc) 3 program control operations (branch, call, interrupt) How the operands are specified is called the addressing mode We will discuss addressing modes more later 5

The Computer s Software The program instructions are stored in memory in machine code or machine language format An assembler is the program used to translate symbolic programs (assembly language) into machine language programs machine language Low level computer instructions that are encoded into binary words assembly language The lowest level human readable programming language All of the detailed instructions for the computer are listed Assembly programs are directly encoded into machine code Assembly code can be written by humans, but is more typically produced by a compiler high level language Humans typically write programs in a language which allows program logic to be expressed at a conceptual level, ignoring the implementation details which are required of assembly language programs Years ago, hardware efficiency was extracted at the expense of the programmer s time If a fast program was needed, then it was written in assembly language Compilers were capable of translating programs from high level languages, but they generated assembly language programs that were relatively inefficient as compared with the same programs written by a programmer in assembly language Programmers often found it necessary to optimize the assembly language code created by a compiler to improve the performance and reduce the memory requirements of the program This is no longer the case Compilers have improved to the point that they can generate code comparable to, or better than, the code most programmers can generate Even if hand crafted optimizations could improve the performance, there is little benefit derived from such a laborious activity Many computers today execute so fast and have enough memory that it is not necessary to optimize code at the assembly language level So, since it is increasingly rare for programmers to work at the assembly language level, why is it necessary to learn assembly language? There are actually several reasons to study assembly language 1 To understand or work on an operating system Operating systems need to execute instructions which can not be expressed in a high level language, so it is necessary that a portion of an operating system be written in assembly language Some instances when an operating system needs assembly language include: initializing the hardware and data in the CPU at boot time, handling interrupts, low level interfaces with hardware peripherals, and cases when a compiler s protection features interfere with the needed operations 2 To understand or work on a compiler 3 Real time or embedded systems programming where there may be critical constraints for a program related either to performance or available memory In some cases with embedded systems, a compiler may not be available 4 To understand the internal working of a computer Computer architecture can best be understood when assembly language is used to supplement the study of computer architecture Assembly language code does not hide details about what the computer is doing Complex Instruction Sets and Reduced Instruction Sets Another important classification of types of computer architectures relates to the available set of instructions for the processor Here we discuss the historical background and technical differences between two types of processors If memory is an expensive and limited resource, there is a large benefit in reducing the size of a program During the 1960s and 1970s, memory was at a premium Therefore, much effort was expended on minimizing the size of individual instructions and minimizing the number of instructions necessary to implement a program During this time period, almost all computer designers believed that rich instruction sets would simplify compiler design and improve the quality of computer architecture New instructions were developed to replace frequently used sequences of instructions For example, a loop variable is often decremented, followed by a branch operation if the result is positive New architectures therefore introduced a single instruction to decrement a variable and branch conditionally based on the result Some instructions came to be more like a procedure than a simple operation Some of these powerful single instructions 6

required four or more parameters As an example, the IBM System/370 has a single instruction that copies a character string of arbitrary length from any location in memory to any other location in memory, while translating characters according to a table stored in memory Computers which feature a large number of complex instructions are classified as complex instruction set computers (CISC) Other examples of CISC computers include the Digital Equipment VAX and the Intel x86 line of processors The DEC VAX has more than 200 instructions, dozens of distinct addressing modes and instructions with as many as six operands The complexity of CISC was accommodated by the introduction of microprogramming or microcode Microcode composed of low-level hardware instructions that implement high-level instructions required by an architecture Microcode was placed in ROM or control-store RAM (which is more expensive, but faster than the ferrite-core memory used in many computers) However, not all computer designers fell in line with the CISC philosophy Seymore Cray, for one, believed that complexity was bad, and continued to build the fastest computers in the world by using simple, register-oriented instruction sets Cray was a proponent of the Reduced Instruction Set Computer (RISC), which is the antidote to CISC The CDC 6600 and the Cray-1 supercomputer were the precursors of modern RISC architectures In 1975, Cray made the following remarks about his computer design: [Registers] made the instructions very simple That is somewhat unique Most machines have rather elaborate instruction sets involving many more memory references in the instructions than the machines I have designed Simplicity, I guess, is a way of saying it I am all for simplicity If it s very complicated, I cannot understand it Various technological changes in the 1980s made the architectural assumptions of the 1970s no longer valid Faster (10 times or more) and cheaper semiconductor memory and integrated circuits began to replace ferrite-core and transistor based discrete circuits The invention of cache memories substantially improved the speed of non-microcoded programs Compiler technology had progressed rapidly; optimizing compilers generate code that used only a small subset of most instruction sets A new set of simplified design criteria emerged: Instructions should be simple unless there is a good reason for complexity To be worthwhile, a new instruction that increases cycle time by 10% must reduce the total number of cycles executed by at least 10% Microcode is generally no faster than sequences of hardwired instructions Moving software into microcode does not make it better It just makes it harder to modify Fixed format instructions and pipelined 1 execution are more important than program size As memory becomes cheaper and faster, the space/time tradeoff resolved in favor of time reducing space no longer decreases time Compiler technology should simplify instructions, rather than generate more complex instructions Instead of adding a complicated microcoded instruction, optimizing compilers can generate sequences of simple, fast instructions to do the job Operands can be kept in registers to increase speed even faster What is RISC? Assembly language programs occasionally use large sets of machine instructions, whereas high level language compilers generally do not For example, SUN s C compiler uses only about 30% of the available Motorola 68020 instructions Studies show that approximately 80% of computations for a typical program requires only 20% of a processor s instruction set The designers of RISC machines strive for hardware simplicity, with close cooperation between machine architecture and compiler design In order to add a new instruction, computer architects must ask: 1 Pipelining relates to parallelizing the steps in the loop of instruction executing The next instruction is fetched and decoded while the current instruction is executing We will discuss pipelining more when we study the Sun SPARC architecture 7

to what extent would the added instruction improve performance and is it worth the cost of implementation? no matter how useful it is in an isolated instance, would it make all others perform more slowly by its mere presence? The goal of RISC architecture is to maximize the effective speed of a design by performing infrequent functions in software and by including in hardware only features that yield a net performance gain Performance gains are measured by conducting detailed studies of large high level language programs RISC architectures eliminate complicated instructions that require microcode support RISC Architecture The following characteristics are typical of RISC architectures Although none of these are required for an architecture to be called RISC, this list does describe most current RISC architectures, including the SPARC design 1 Single cycle execution: Most instructions are executed in a single machine cycle 2 Hardwired control with little or no microcode: Microcode adds a level of complexity and raises the number of cycles per instruction 3 Load/Store, register-to-register design: All computational instructions involve registers Memory accesses are made with only load and store instructions 4 Simple fixed-format instructions with few addressing modes: All instructions are the same length (typically 32 bits) and have just a few ways to address memory 5 Pipelining: The instruction set design allows for the processing of several instructions at the same time 6 High performance memory: RISC machines have at least 32 general purpose registers and large cache memory 7 Migration of functions to software: Only those features that measurably improve performance are implemented in hardware Software contains sequences of simple instructions for executing complex functions rather than complex instructions themselves, which improves system efficiency 8 More concurrency is visible to software: For example, branches take effect after execution of the following instruction, permitting a fetch of the next instruction during execution of the current instruction The real keys to enhanced performance are single-cycle execution and keeping the cycle time as short as possible Many characteristics of RISC architectures, such as load/store and register-to-register design, facilitate single-cycle execution Simple fixed-format instructions on the other hand, permit shorter cycles by reducing decoding time Early RISC Machines In the mid 1970s, some computer architects observed that even complex computers execute mostly simple instructions This observation led to work on the IBM 801 the first intentional RISC machine (even though the term RISC had yet to be coined) The term RISC was coined as part of David Patterson s 1980 course in microprocessor design at the University of California at Berkeley The RISC-I chip design was completed in 1982, and the RISC-II chip design was completed in 1984 The RISC-II was a 32-bit microprocessor with 138 registers, and a 330-ns (3 MHz) cycle time Without the aid of elaborate compiler technology, the RISC-II outperformed the VAX 11/780 at integer arithmetic 8

Memory bus CPU L2 Cache Main Memory I/O Devices Register file (disk) L1 Cache Registers L1 Cache L2 Cache Main Memory Disk Memory Size: Speed: 200 B 128 KB 256 KB 128 MB 5 ns 6 ns 10ns 100 ns 30 GB 5ms Memory Hierarchy Design Memory hierarchy design is based on three important principles: Make the common case fast Principle of locality Spatial locality refers to the fact that memory that is physically located closer to the CPU can be accessed faster Temporal locality refers to the tendency of programs to access the same data several times in a short period of time Smaller is faster These are the levels in a typical memory hierarchy Moving farther away from the CPU, the memory in the level becomes larger and slower When a memory lookup is required, the L1 cache is searched first If the data is found, this is called a hit If the data is not in L1 cache, this is called a miss and the L2 cache is checked If the data is not in the L2 cache, then the data is retrieved from main memory When there is a miss at either the L1 or L2 cache, the data retrieved from the next level is saved in the cache for future use Cache hits make the program run much faster than if all memory accesses must go to the main memory The connection between the CPU and main memory is called the front-side bus A common design is for the front-side bus to be divided into four channels If the front-side bus speed is listed at 800 MHz, it is probably four channels each running at 200 MHz The connection between the CPU and the L2 cache is called the backside bus Binary Representation of Data Here we briefly consider the format used to store data variables in memory and in registers If you need more details than is provided here, then check your notes from EECE 241 or other resources 9

Larger memory quantity Registers L1 Cache L2 Cache Main Memory Disk Faster, more expensive memory Integer Variables Unsigned variables that generally fall into the category of integers (char, short, int, long) are stored in straight binary format, beginning with all zeros for zero up to all ones for the largest number that can be represented by the data type The signed variables that generally fall into the category of integers (char,short,int,long)are stored in 2 s compliment format This ensures that the binary digits represent a continuous number line from the most negative number to the largest positive number with zero being represented with all zero bits The most significant bit is considered the sign bit The sign bit is one for negative numbers and zero for positive numbers Decimal int (hex) short (hex) -2,147,483,648 0x80000000-2,147,483,647 0x80000001-32,768 0xffff8000 0x8000-32,767 0xffff8001 0x8001-2 0xfffffffe 0xfffe -1 0xffffffff 0xffff 0 0x00000000 0x0000 1 0x00000001 0x0001 32,767 0x00007fff 0x7fff 2,147,483,647 0x7fffffff Any two binary numbers can thus be added together in a straight forward manner to get the correct answer If there is a carry bit beyond what the data type can represent, it is discarded 1 0x0001 +(-1) + 0xffff ------ --------- 0 0x0000 To change the sign of any number, invert all the bits and add 1 2 = 0x0002 = 000010 ==> 111101 + 1 ----------- 111110 = 0xfffe = -2 10

X+ 8 X + 8 00010010 X + 4 X + 5 X + 6 X + 7 00010010 X + 7 X + 6 X + 5 X + 4 X 0x12 0x34 0x56 0x78 0x12 0x34 0x56 0x78 X + 1 X + 2 X + 3 X + 3 X + 2 X + 1 X X 4 X 3 X 2 X 1 X 1 X 2 X 3 X 4 X 8 X 8 Big Endian Little Endian Memory at Address X contains 0x12345678 Big/Little Endian Memory Maps Conversions of Integer Variables It is often necessary to convert a smaller data type to a larger type For this, there are either special instructions (Intel x86), or a sequence of a couple simple instructions (Sun SPARC) to promote a variable to a larger data type If the variable is unsigned, then extra zeros are just filled into the most significant bits (movezx move - zero extending, for Intel x86) For signed variables, then the sign bit needs to be extended to fill the most significant bits (movesx move - sign extending, for Intel x86) 0x6fa1 ==> 0x00006fa1 (sign extend a positive number) 0xfffe ==> 0xfffffffe (sign extend a negative number) 0x9002 ==> 0xffff9002 (sign extend a negative number) Byte Order Not all computers store the bits (and bytes) of a variable in the same order The Intel x86 line of processors stores the least significant bit in the lowest memory address (right most position) and the most significant bit in the highest memory address This scheme is called Little Endian Sun SPARC and most other UNIX platforms do the opposite They store the most significant byte in the lowest memory address SPARC is thus considered a Big Endian machine In a TCP/IP packet, the first transmitted data is the most significant byte, thus the Internet is considered Big Endian The lowest memory address is considered the memory address for a variable Hence we see a difference between Little Endian and Big Endian when we draw memory maps With Little Endian (Intel) we label the location of an address on the right side of the map With Big Endian (SPARC), labels are placed on the left side of the map The term is used because of an analogy with the story Gulliver s Travels, in which Jonathan Swift imagined a never-ending fight between the kingdoms of the Big-Endians and the Little-Endians, whose only difference is in where they crack open a hard-boiled egg 11

Single Precision s exp mantissa 1 8 23 32 bits Double Precision s exp mantissa 1 11 52 64 bits IEEE FPS floating point formats Floating Point Variables Floating point variables have been represented in many different ways inside computers of the past But there is now a well adhered to standard for the representation of floating point variables The standard is known as the IEEE Floating Point Standard (FPS) Like scientific notation, FPS represents numbers with multiple parts, a sign bit, one part specifying the mantissa and a part representing the exponent The mantissa is represented as a signed magnitude integer (ie, not 2 s Compliment), where the value is normalized The exponent is represented as an unsigned integer which is biased to accommodate negative numbers An 8-bit unsigned value would normally have a range of 0 to 255, but 127 is added to the exponent, giving it a range of -126 to +127 Follow these steps to convert a number to FPS format 1 First convert the number to binary 2 Normalize the number so that there is one nonzero digit to the left of the binary place, adjusting the exponent as necessary 3 The digits to the right of the binary point are then stored as the mantissa starting with the most significant bits of the mantissa field Because all numbers are normalized, there is no need to store the leading 1 Note: Because the leading 1 is dropped, it is no longer proper to refer to the stored value as the mantissa In IEEE terms, this mantissa minus its leading digit is called the significand 4 Add 127 to the exponent and convert the resulting sum to binary for the stored exponent value For double precision, add 1023 to the exponent Be sure to include all 8 or 11 bits of the exponent 5 The sign bit is a one for negative numbers and a zero for positive numbers 6 Compilers often express FPS numbers in hexadecimal, so a quick conversion to hexadecimal might be desired Here are some examples using single precision FPS 35 = 111 (binary) = 111 x 2^1 sign = 0, significand = 1100, exponent = 1 + 127 = 128 = 10000000 FPS number (35) = 0100000001100000 = 0x40600000 12

100 = 1100100 (binary) = 1100100 x 2^6 sign = 0, significand = 100100, exponent = 6 + 127 = 133 = 10000101 FPS number (100) = 010000101100100 = 0x42c80000 What decimal number is represented in FPS as 0xc2508000? Here we just reverse the steps 0xc2508000 = 11000010010100001000000000000000 (binary) sign = 1; exponent = 10000100; significand = 10100001000000000000000 exponent = 132 ==> 132-127 = 5-110100001 x 2^5 = -110100001 = -52125 Floating Point Arithmetic Until fairly recently, floating point arithmetic was performed using complex algorithms with an integer ALU The main ALU in CPUs is still an integer arithmetic ALU However, in the mid-1980s, special hardware was developed to perform floating point arithmetic Intel, for example, sold a chip known as the 80387 which was a math co-processor to go along with the 80386 CPU Most people did not buy the 80387 because of the cost A major selling point of the 80486 was that the math co-processor was integrated onto the CPU, which eliminated the need to purchase a separate chip to get faster floating point arithmetic Floating point hardware usually has a special set of registers and instructions for performing floating point arithmetic There are also special instructions for moving data between memory or the normal registers and the floating point registers Most of the discussion in this class will focus on integer operations, but we will try to show at least a couple examples of floating point arithmetic Role of the Operating System The operating system (OS) is a program that allocates and controls the use of all system resources: the processor, the main memory, and all I/O devices In addition, the operating system allows multiple, independent programs to share computer resources while running concurrently But when we look at our programs (written in any language), we don t see any allowance for the operating system or any other program The code is written as if our program is the only program running So how is this accomplished? How does the operating system get control back from user programs to do its work? The answer relates to the tight coupling between key parts of thecodeintheoskernel, 2 the architecture of the CPU, and something called interrupts When a computer is turned on, or booted, the OS (Windows, Linux, Minix, Solaris, etc ), initializes the hardware and also builds critical data structures in memory Most of the data structures are used by the operating system kernel However, some of the data structures are loaded according to the specification of the CPU manufacturer This CPU specific data is used to switch processing from user programs and the kernel In the Intel x86, for example, two special registers in the CPU hold pointers to memory used when an interrupt is received When a hardware event occurs, such as when a key is pressed on the keyboard, a hardware interrupt is issued The CPU then reads a register to get a pointer to a stack where it will save some of the key register values to This is not the same stack that the user program uses The CPU then reads another register to get a pointer to a special table called the interrupt descriptor table It also checks with the interrupt hardware to get a vector for which interrupt occurred Then, based on which interrupt occurred and the information in the interrupt descriptor table, the CPU causes processing to switch from the running of a user level program to running a interrupt handler in the kernel All of the above described operations are done automatically by the 2 The kernel of an OS is the critical part of the OS that handles the lowest levels of the OS such as scheduling of processes, memory management, and device control It is not related to the user interface or utilities provided by the OS 13

CPU when an interrupt is received Thus, the reception of an interrupt is how user programs are suspended and processing switched to the kernel Once the kernel gets control, it will want to save more registers from the user program, handle the hardware event and check if work needs to be done related to internal operations such as memory or process management Then finally, the kernel will let a user program run again In doing so, it will restore some registers and issue a special instruction that causes the final registers to be restored and processing to switch back to the user program Since all the registers are restored, the user program never knows that it was interrupted There are three types of interrupts which the CPU recognizes Hardware Interrupt This is any type of hardware event such as a key pressed on the keyboard, a hard disk completing the reading or writing of data, or the reception of an ethernet packet, etc Many operating systems program a clock to issue interrupts at regular intervals so that the kernel is guaranteed to get control on a regular basis even if no hardware events occur and a user program never releases the CPU Software Interrupt When a user program needs to make a system call to the operating system, such as for I/O or to request more memory, it may issue a special instruction called a software interrupt to cause the CPU to switch processing to the kernel Trap A trap is issued by the CPU itself when it detects that something is wrong or needs special attention In most cases a trap is issued when a user program performs an illegal instruction such as a divide by zero error or illegal memory reference In the Sun SPARC, there are some traps which occur in normal processing of a program Most of the kernel s code is termed reentrant, meaning that additional interrupts may be received even while processing a previous interrupt There are special assembly language instructions to turn interrupts off or on Interrupts are turned off in critical sections of the kernel when an interrupt will cause memory corruption in the kernel When interrupts are turned off, interrupts are queued by the hardware and will be issued when interrupts are turned on again A critical concern in operating system design is knowing when to turn interrupts off and on Interrupts should be left on except when absolutely necessary Thus operating systems use clever algorithms to make as much of the kernel reentrant as possible More will be discussed about operating systems as related to computer architecture and assembly language later in the semester after more specifics of the processors and assembly language have been covered 14