COMPUTER ARCHITECTURE I CIS Page

COMPUTER ARCHITECTURE I CIS 210 1 Page

TABLE OF CONTENTS UNIT ONE General Overview of Computer Organization UNIT TWO Registers and Instruction Code 14 UNIT THREE Control Unit Design Hardwired and Multiprogramming- 22 UNIT FOUR Trends in Computer Architecture 34 UNIT FIVE Bus Organization 48 UNIT SIX RISC AND CISC 59 UNIT SEVEN Parallelism and Pipelining 65 UNIT EIGHT Memory Hierarchy 71 UNIT NINE Basic logic design 2 Page Page 3 76

UNIT ONE General Overview of Computer Organization 1.0 Introduction In computer science, computer architecture or digital computer organization is the conceptual design and fundamental operational structure of a computer system. It may also be defined as the science and art of selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals. Computer architecture comprises at least three main subcategories: Instruction set architecture, or ISA, is the abstract image of a computing system that is seen by a machine language (or assembly language) programmer, including the instruction set, word size, memory address modes, processor registers, and address and data formats. Microarchitecture, also known as Computer organization is a lower level, more concrete and detailed, description of the system that involves how the constituent parts of the system are. It may also be defined as the science and art of selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals. Figure 1.1: Computer Architecture 3 Page

1.1 Instruction Set Architecture The Instruction Set Architecture (ISA) describes a set of instructions whose syntactic and semantic characteristics are defined by the underlying computer architecture. Figure 1.2: Instruction set Architecture 1) The ISA is the interface between the software and hardware. 2) It is the set of instructions that bridges the gap between high level languages and the hardware. 3) For a processor to understand a command, it should be in binary. The ISA encodes these values. 4) The ISA also defines the items in the computer that are available to a programmer. For example, it defines data types, registers, addressing modes, memory organization etc. 5) Register s are high speed storage for numbers that can be accessed by a processor, data as well as instruction. 4 Page

1.2 Interface Design A good interface: Lasts through many implementations (portability, compatibility) Is used in many different ways (generality) Provides convenient functionality to higher levels Permits an efficient implementation at lower levels Evolution of Instruction Sets Major advances in computer architecture are typically associated with landmark instruction set designs. Design decisions must take into account: technology machine organization programming languages compiler technology operating systems applications System Design which includes all of the other hardware components within a computing system such as: 1. System interconnects such as computer buses and switches 2. Memory controllers and hierarchies 3. CPU off-load mechanisms such as direct memory access (DMA) 4. Issues like multiprocessing 5 Page

1.3 Computer Architecture: The Definition The computer architecture is the coordination of abstract levels of a processor under changing forces, involving design, measurement and evaluation. It also includes the overall fundamental working principle of the internal logical structure of a computer system. Addressing modes are the way that an instructions are locates its operands. Memory organization defines how instructions interact with the memory. 1.4 Design goals The exact form of a computer system depends on the constraints and goals for which it was optimized. 6 Page

Figure 1.2: Forces on Computer Architecture Computer architectures usually trade off standards, cost, memory capacity, latency and throughput. Sometimes other considerations, such as features, size, weight, reliability, expandability and power consumption are factors as well. The most common scheme carefully chooses the bottleneck that most reduces the computer's speed. Ideally, the cost is allocated proportionally to assure that the data rate is nearly the same for all parts of the computer, with the most costly part being the slowest. This is how skillful commercial integrators optimize personal computers. 1.5 Computer Organization Computer organization helps optimize performance-based products. For example, software engineers need to know the processing ability of processors. They may need to optimize 7 Page

software in order to gain the most performance at the least expense. This can require quite detailed analysis of the computer organization. For example, in a multimedia decoder, the designers might need to arrange for most data to be processed in the fastest data path. Computer organization also helps plan the selection of a processor for a particular project. Multimedia projects may need very rapid data access, while supervisory software may need fast interrupts. Sometimes certain tasks need additional components as well. For example, a computer capable of virtualization needs virtual memory hardware so that the memory of different simulated computers can be kept separated. The computer organization and features also affect the power consumption and the cost of the processor. A computer organization refers to the level of abstraction above the digital logic, but below the operating system level. The major components of a computer organization are functional units or subsystems that correspond to specific pieces of hardware built from lower level building blocks. Computer organization deals with the computer's architecture i.e memory, registers, RAM, ROM, CPU and the likes. Processing Unit: This is the responsible for processing all the various operations that goes on in the system unit. It is referred to as the brain of the computer system, without it the computer will be valueless. Example of processing unit is the central processing unit (CPU). The processing unit can be divided into three sections: i Control Unit ii Arithmetic and logical unit (ALU) iii Memory Unit. 1.5.1 The Control Unit: This unit coordinates all the various operations of the input and output units in the system unit precisely the Central Processing Unit (CPU). 8 Page

(a) Functions of the control unit (i). It directs the sequence of operations, (ii). It interprets the instructions, of a program, in storage unit and produces signals the command circuits to execute the instructions. (iii). It directs the flow of all activities in the computer system. 1.5.2 The arithmetic and logical unit: This unit is responsible for performing all the various Arithmetic operation of addition, subtraction, multiplication, division and relational operations Such as not equal to (¹) greater than (>) less than (< ) greater than or equal to ( ) and logical operation. Etc. 1.5.3 The memory unit: This unit, known to be the main or primary memory (storage). It stores information (i.e. instruction data intermediate and final result of processing) that arrives via the input unit so that this information is made available to the appropriate Quarters for further processing after which they can be presented to the user via the Output unit. The memory unit is divided into two: i. Random Access Memory (RAM) ii. Read Only Memory (ROM) Other versions of ROM are: PROM(Programmable Read Only Memory) EPROM.(Erasable Programmable Read Only Memory) 1.6 Levels of Machines There are a number of levels in a computer, from the user level down to the transistor level. 9 Page

Figure 1.4: Machine Levels Progressing from the top level downward, the levels become less abstract as more of the internal structure of the computer becomes visible. 1.7 Performance Power consumption Low-power electronics Power consumption is another design criterion that factors in the design of modern computers. Power efficiency can often be traded for performance or cost benefits. With the increasing power density of modern circuits as the number of transistors per chip scales (Moore s law), power efficiency has increased in importance. Recent processor designs such as the Intel Core 2 put more emphasis on increasing power efficiency. Also, in the world of embedded computing, power efficiency has long been and remains the primary design goal next to performance. 10 P a g e

Computer performance is often described in terms of clock speed (usually in MHz or GHz). This refers to the cycles per second of the main clock of the CPU. However, this metric is somewhat misleading, as a machine with a higher clock rate may not necessarily have higher performance. As a result manufacturers have moved away from clock speed as a measure of performance. Computer performance can also be measured with the amount of cache a processor has. If the speed, MHz or GHz, were to be a car then the cache is like the gas tank. No matter how fast the car goes, it will still need to get gas. The higher the speed, and the greater the cache, the faster a processor runs. Modern CPUs can execute multiple instructions per clock cycle, which dramatically speeds up a program. Other factors influence speed, such as the mix of functional units, bus speeds, available memory, and the type and order of instructions in the programs being run. There are two main types of speed, latency and throughput. Latency is the time between the start of a process and its completion. Throughput is the amount of work done per unit time. Interrupt latency is the guaranteed maximum response time of the system to an electronic event (e.g. when the disk drive finishes moving some data). Performance is affected by a very wide range of design choices for example, pipelining a processor usually makes latency worse (slower) but makes throughput better. Computers that control machinery usually need low interrupt latencies. These computers operate in a real-time environment and fail if an operation is not completed in a specified amount of time. For example, computer-controlled anti-lock brakes must begin braking almost immediately after they have been instructed to brake. The performance of a computer can be measured using other metrics, depending upon its application domain. A system may be CPU bound (as in numerical calculation), I/O bound (as in a web serving application) or memory bound (as in video editing). Power consumption has become important in servers and portable devices like laptops. 11 P a g e

Benchmarking tries to take all these factors into account by measuring the time a computer takes to run through a series of test programs. Although benchmarking shows strengths, it may not help one to choose a computer. Often the measured machines split on different measures. For example, one system might handle scientific applications quickly, while another might play popular video games more smoothly. Furthermore, designers have been known to add special features to their products, whether in hardware or software, which permit a specific benchmark to execute quickly but which do not offer similar advantages to other, more general tasks. 1.8 Summary A computer's architecture is its abstract model and is the programmer's view in terms of instructions, addressing modes and registers. A computer's organization expresses the realization of the architecture. Architecture describes what the computer does and organization describes how it does it. Architecture and organization are independent; you can change the organization of a computer without changing its architecture. For example, a 64-bit architecture can be internally organized as a true 64-bit machine or as a 16-bit machine that uses four cycles to handle 64-bit values. The difference between architecture and organization is best illustrated by a non-computer example. Is the gear lever in a car part of its architecture or organization? The architecture of a car is simple; it transports you from A to B. The gear lever belongs to the car's organisation because it implements the function of a car but is not part of that function (a car does not intrinsically need a gear lever). The ISA is the interface between the software and hardware. It is the set of instructions that bridges the gap between high level languages and the hardware. 1.9 Questions 1. Computer architecture is dead - Long live computer organization. Discuss. 2. Explain architecture of computer system 12 P a g e

3. Define computer architecture and design. 4. What is computer organization? 5. Define implementation in terms of computer architecture and design? 6.. Define the programmer visible macro architecture? 7. What do you understand by ISA? 13 P a g e

UNIT TWO REGISTERS AND INSTRUCTION CODE 2.0 Introduction In a computer, a register is one of a small set of data holding places that are part of a computer processor. A register may hold a computer instruction, a storage address, or any kind of data (such as a bit sequence or individual characters). Some instructions specify registers as part of the instruction. For example, an instruction may specify that the contents of two defined registers be added together and then placed in a specified register. A register must be large enough to hold an instruction - for example, in a 32-bit instruction computer, a register must be 32 bits in length. In some computer designs, there are smaller registers - for example, half-registers - for shorter instructions. Depending on the processor design and language rules, registers may be numbered. 2.1 Registers: 2.1.1 Program Counter (PC) Program Counter stores the address of the macro-instruction currently being executed. It is commonly called the instruction pointer (IP) in Intel x86 microprocessors, and sometimes called the instruction address register, or just part of the instruction sequencer in some computers, is a processor register It is a 16 bit special function register in the 8085 microprocessor. It keeps track of the next memory address of the instruction that is to be executed once the execution of the current instruction is completed. In other words, it holds the address of the memory location of the next instruction when the current instruction is executed by the microprocessor. 14 P a g e

2.1.2 Accumulator.(AC) It stores a previously calculated value or a value loaded from the main memory. This Register is used for storing the Results that are produced by the System. When the CPU generates some results after the processing then all the results will be stored into the AC register. 2.1.3 Instruction Register.(IR) It stores a copy of the instruction loaded from main memory. Temporary Instruction Register, TIR: As the CPU evaluates exactly what an instruction is supposed to do, it stores the edited instruction in the TIR. 2.1.4 MDR is the register of a computer's control unit that contains the data to be stored in the computer storage (e.g. RAM), or the data after a fetch from the computer storage. It acts like a buffer and holds anything that is copied from the memory ready for the processor to use it. MDR hold the information before it goes to the decoder. MDR contains the data to be written into or readout of the addressed location. For example, to retrieve the contents of cell 123, we would load the value 123 (in binary, of course) into the MAR and perform a fetch operation. When the operation is done, a copy of the contents of cell 123 would be in the MDR. To store the value 98 into cell 4, we load a 4 into the MAR and a 98 into the MDR and perform a store. When the operation is completed the contents of cell 4 will have been set to 98, by discarding whatever was there previously. The MDR is a two-way register. When data is fetched from memory and placed into the MDR, it is written to in one direction. When there is a write instruction, the data to be written is placed into the MDR from another CPU register, which then puts the data into memory. 15 P a g e

2.1.5 Memory Address Register, MAR. This register contains the address of the place the CPU wants to work with in the main memory. It is directly connected to the RAM chips on the motherboard. This register holds the memory addresses of data and instructions. This register is used to access data and instructions from memory during the execution phase of an instruction. Suppose CPU wants to store some data in the memory or to read the data from the memory. It places the address of the-required memory location in the MAR. 2.1.6 The Memory Data Register is half of a minimal interface between a micro program and computer storage, the other half is a memory address register. 1 A constant that represents the number one. The CPU cannot access a number unless it is in a register or loaded from main memory, or somehow computed. Therefore this register is set aside to represent this often used number, Data register is used in microcomputers to temporarily store data being transmitted to or from a peripheral device. The registers are the places where the values that the CPU is actually working on are located. The CPU design is such that it is only able to actually modify or otherwise act on a value when it is in a register. So registers can work logic, whereas memory (including cache) can only hold values the CPU reads from and writes to. 2.1,7 Memory Buffer Register (MBR).. This register contains the word that was either loaded from main memory or that is going to be stored in main memory. It is also directly connected to the RAM chips on the motherboard. This register holds the contents of data or instruction read from, or written in memory. It means that this register is used to store data/instruction coming from the memory or going to the memory These Registers are used for performing the various Operations. While we are working on the System then these Registers are used by the CPU for Performing the Operations. When we give some Input to the System, then the Input will be stored into the Registers. When the System gives us the Results after Processing, then the result will also be from the Registers. They are used by the CPU for Processing the Data which is given by the User. Registers Perform:16 P a g e

1) Fetch: The Fetch Operation is used for taking the instructions those are given by the user and the Instructions those are stored into the Main Memory will be fetch by using Registers. 2) Decode: The Decode Operation is used for interpreting the Instructions means the Instructions are decoded means the CPU will find out which Operation is to be performed on the Instructions. 3) Execute: The Execute Operation is performed by the CPU. And Results those are produced by the CPU are then Stored into the Memory and after that they are displayed on the user Screen. 2.2 Index Register A hardware element, which holds a number that can be added to (or, in some cases, subtracted from) the address portion of a computer instruction to form an effective address. It is also known as base register. An index register in a computer's CPU is a processor register used for modifying operand addresses during the run of a program. Imagine a carpenter at work. He has a few items in his hands (registers) and then, very close by on his workbench (cache) things he is frequently working on, but not using right this moment, and then in the workshop (main memory) things that pertain to the project at hand but that are not immediately important enough to be on the workbench. Here's a simple explanation for how register logic works. Let's imagine we have four registers named R1..R4. If you compile a statement that looks like this: x = y + z * 3; the compiler would output machine code that (when disassembled) looks something like this: LOAD R1, ADDRESS_Z //move the value of Z into register 1 17 P a g e

MUL R1, 3 //multiply the value of register 1 by 3 LOAD R2, ADDRESS_Y //move the value of Y into register 2 ADD R1, R2 //adds the value in R2 to the value in R1 STORE R1, ADDRESS_X //move the value of register 1 into X Since most modern CPUs have registers that are either 32 or 64 bits wide, they can do math on any value up to the size they can hold. They don't need special registers for smaller values; they just use special ASM instructions that tell it to only use part of the register. And, much like the carpenter with only two hands, registers can only hold a small amount of data at once, but they can be reused, passing active data in and out of them, which means that "a lot of registers" don't end up being needed. (Having a lot available does allow compilers to generate faster code, of course, but it's not strictly necessary. 2.3 Summary In a computer, a register is one of a small set of data holding places that are part of a computer processor. A register may hold a computer instruction, a storage address, or any kind of data (such as a bit sequence or individual characters). Some instructions specify registers as part of the instruction. Registers are temporary storage areas for instructions or data. They are not a part of memory; rather they are special additional storage locations that offer the advantage of speed. Registers work under the direction of the control unit to accept, hold, and transfer instructions or data and perform arithmetic or logical comparisons at high speed. The control unit uses a data storage register the way a store owner uses a cash register-as a temporary, convenient place to store what is used in transactions. Computers usually assign special roles to certain registers, including these registers: An accumulator, which collects the result of computations. An address register, which keeps track of where a given instruction or piece of data is stored in memory. Each 18 P a g e

storage location in memory is identified by an address, just as each house on a street has an address. A storage register, which temporarily holds data taken from or about to be sent to memory. A general-purpose register, which is used for several functions. A register is a memory location within the CPU itself, designed to be quickly accessed for purposes of fast data retrieval. Processors normally contain a register array, which houses many such registers. These contain instructions, data and other values that may need to be quickly accessed during the execution of a program. Many different types of registers are common between most microprocessor designs. These are: Program Counter (PC) This register is used to hold the memory address of the next instruction that has to executed in a program. This is to ensure the CPU knows at all times where it has reached, that is able to resume following an execution at the correct point, and that the program is executed correctly. Instruction Register (IR) This is used to hold the current instruction in the processor while it is being decoded and executed, in order for the speed of the whole execution process to be reduced. This is because the time needed to access the instruction register is much less than continual checking of the memory location itself. Accumulator (A, or ACC) The accumulator is used to hold the result of operations performed by the arithmetic and logic unit, as covered in the section on the ALU. 19 P a g e

Memory Address Register (MAR) Used for storage of memory addresses, usually the addresses involved in the instructions held in the instruction register. The control unit then checks this register when needing to know which memory address to check or obtain data from. Memory Buffer Register (MBR) When an instruction or data is obtained from the memory or elsewhere, it is first placed in the memory buffer register. The next action to take is then determined and carried out, and the data is moved on to the desired location. Flag register / status flags The flag register is specially designed to contain all the appropriate 1-bit status flags, which are changed as a result of operations involving the arithmetic and logic unit. Further information can be found in the section on the ALU. Other general purpose registers These registers have no specific purpose, but are generally used for the quick storage of pieces of data that are required later in the program execution. In the model used here these are assigned the names A and B, with suffixes of L and U indicating the lower and upper sections of the register respectively. 2.4 Questions 1. Explain the General Structure of CPU. 2/ What is the use of registers in CPU? 3 What is the function of MAR? 20 P a g e

4. What is the function of MDR / MBR? 21 P a g e

UNIT THREE CONTROL UNIT DESIGN- HARDWIRED AND MICROPROGRAMMING 3.0 Control Unit The control unit is a finite state machine that takes as its inputs the IR, the status register (which is partly filled by the status output from the ALU), and the current major state of the cycle. Its rules are encoded either in random logic, a Programmable Logic Array (PLA), or Read-Only Memory (ROM), and its outputs are sent across the processor to each point requiring coordination or direction for the control unit. For example the outputs needed for the portion of the instruction/data path shown in figure 3.1 are Jump/Branch/NextPC, IR Latch, Read Control, Load Control, ALU Function Select, Load/Reg-Reg, Reg R/W. Figure 3.1: Register and Control Unit The ALU function select takes the instruction op code and translates it into a given function of the ALU (either one line per ALU function or a compact binary code for the function). The Jump/Branch/PC depends on the instruction type and in a RISC architecture these may be directly coded in the op code. Read control occurs at the start of an instruction cycle. IR latch and occurs at the end of the fetch state. Load control happens at the end of the data fetch state of a load instruction. Load Reg/Reg again depends on the op-code. Register R/W is in the start of the data fetch stage and at the write back stage of an operation. It thus depends on the major state and the instruction. In a CISC architecture, the Fetch may only retrieve the first part of an instruction, and (depending on bits in the IR that are then decoded by the control unit) more words may need to 22 P a g e

be fetched. In a RISC architecture, a single Fetch retrieves a complete instruction, so we may proceed to the next major state, which is usually to begin fetching data from the registers, while we decode the instruction. In a RISC architecture, decoding an instruction mainly means that the instruction type field determines what the control unit will do for the remainder of the instructions. If you think of the CU as a finite state machine, the bits in the type field select the next state following the decode. In terms of a program s logic, this is like selecting a branch in a Switch statement each branch of the Switch contains the series of steps to be performed for one type of instruction. For example, after decoding a Jump instruction, the control unit outputs the signals required to combine the address portion of the instruction with the upper bits of the PC and load the result back into the PC. The CU then returns to the Fetch step. Thus, a Jump has three major states (Fetch, Decode, and Complete). For a memory Load instruction, the CU first sends one of the selected register values (the address) to the address port of the memory (via a multiplexer), and signals the memory to fetch this location. When the memory returns the value, then the CU sends signals to the necessary multiplexer(s) and the register file so that the memory data goes over the Dest bus and is stored in the designated register. Thus, a Load has four major states (Fetch, Decode, Memory, Write Back). So, for each type of instruction, and for each major state in each type of instruction, we look at the list of control signals and decide what value each signal must have. In some cases, the value doesn t matter (e.g., if memory isn t selected, it doesn t matter whether it is set to read or write, because it simply won t do anything in either case). You can think of this as a large 2dimensional table indexed by instruction type and major state. Within each cell of the table is a list of the control signal and their values. One last bit of control output that we ve neglected is the control of the major state itself. This is usually a register, as shown above, that is input to the CU. But it also receives its next value on each clock from the CU. In the above example, the Jump proceeds from State 0 (Fetch) to State 1 23 P a g e

(Decode) to State 3 (Complete) and then goes back to State 0. While a Load adds a State 4, In some designs, the state register also encodes the instruction type. Thus, it is really referring to the different states of the finite state machine (FSM) rather than the major steps of the instructions. So, for example, the FSM states for a Load might be the sequence 0, 1, 12, 13. The latter two distinguish Memory and Write Back from the Complete stage of the Jump. In other designs, we might see Jump going through states 0, 1, 2, and Load going through 0, 1, 2, 3, with the type field used to distinguish the different behaviour of the latter states. This is all just a matter of using somewhat different ways of naming the same things. The important point is just that the CU has the inputs it needs to know what it is supposed to be doing on the present clock and what it will do next. In the CU design process, this translate to ensuring that one of the control signals on the list is the next state signal, and that we always specify this in every cell of our table The control unit is responsible for directing and coordinating most of the computer system activities. It does not execute instructions itself, but it tells other parts of the computer system what to do. It determines the movement of electronic signals between the main memory and the arithmetic and logic unit. It also controls the signals between the CPU and input/output devices. It consists of registers to hold the address of the current instructions being executed, current instruction itself and the address of the current instruction being executed and take necessary action. The control unit consists of several registers like address registers, instruction registers, sequence registers, decoder etc. When program is executed, a number of steps are to be followed by the computer system. The instruction is selected by the sequenced resisters and sent to the instruction register. The operation part of the instruction is sent to the decoder and address part to the address register. 24 P a g e

The control unit issues order to extract the contents from the address and transfer them to ALU. Also, the instruction like multiply or divide also goes to ALU. The sequence register moves on to the next instruction. The process of fetching the instruction and sending its part to appropriate registers is called fetch cycle. The calculation of the result after performing the required arithmetic or logic operation is called execution cycle Control Unit Correct sequencing of control signals Much like human brain controlling various parts of body Sequence and timing is the key Any aberration will result in wrong operation 25 P a g e

3.1 Hard-Wired Control Unit For each instruction, the control unit causes the CPU to execute a sequence of steps correctly. In reality, there must be control signals to assert lines on various digital components to make things happen. For example, when we perform an Add instruction in assembly language, we assume the addition takes place because the control signals for the ALU are set to "add" and the result is put into the AC. The ALU has various control lines that determine which operation to perform. The question we need to answer is, "How do these control lines actually become asserted?" We can take one of two approaches to ensure control lines are set properly. The first approach is to physically connect all of the control lines to the actual machine instructions. The instructions are divided up into fields, and different bits in the instruction are combined through various digital logic components to drive the control lines. This is called hardwired control, and is illustrated in figure (1) Control Signals (These signals go to register) The bus and the ALU Figure (1) Hardwired Control Organization 26 P a g e

The control unit is implemented using hardware (for example: NAND gates, flip-flops, and counters).we need a special digital circuit that uses, as inputs, the bits from the Opcode field in our instructions, bits from the flag (or status) register, signals from the bus, and signals from the clock. It should produce, as outputs, the control signals to drive the various components in the computer. The advantage of hardwired control is that is very fast. The disadvantage is that the instruction set and the control logic are directly tied together by special circuits that are complex and difficult to design or modify. If someone designs a hardwired computer and later decides to extend the instruction set, the physical components in the computer must be changed. This is prohibitively expensive, because not only must new chips be fabricated but also the old ones must be located and replaced. 3.2 Microprogramming Microprogramming is a second alternative for designing control unit of digital computer (uses software for control). A control unit whose binary control variables are stored in memory is called a microprogrammed control unit. The control variables at any given time can be represented by a string of 1's and 0's called a control word (which can be programmed to perform various operations on the component of the system). Each word in control memory contains within it a microinstruction. The microinstruction specifies one or more microoperatiotins for the system. A sequence of microinstructions constitutes a microprogram. A memory that is part of a control unit is referred to as a control memory. A more advanced development known as dynamic microprogramming permits a microprogram to be loaded initially from an auxiliary memory such as a magnetic disk. Control units that use dynamic microprogramming employ a writable control memory; this type of memory can be used for writing (to change the microprogram) but is used mostly for reading. 27 P a g e

The general configuration of a microprogrammed control unit is demonstrated in the block diagram of Figure (2). The control memory is assumed to be a ROM, within which all control information is permanently stored. Figure (2) Microprogrammed Control Organization The control memory address register specifies the address of the microinstruction and the control data register holds the microinstruction read from memory the microinstruction contains a control word that specifies one or more microoperations for the data processor. Once these operations are executed, the control must determine the next address. The location of the next microinstruction may be the one next in sequence, or it may be locate somewhere else in the control memory. For this reason it is necessary to use some bits of the present microinstruction to control the generation of the address of the next microinstruction. The next address may also be a function of external input conditions. While the microoperations are being executed, the next address is computed in the next address generator circuit and then transferred into the control address register to read the next microinstruction. The next address generator is sometimes called a microprogram sequencer, as it determines the address sequence that is read from control memory, the address of the next microinstruction can be specified several ways, depending on the sequencer inputs. Typical functions of a microprogram sequencer are incrementing the control address register by one, loading into the control address register an address from control memory, transferring an external address or loading an initial address to start the control operations. The main advantages of the microprogrammed control are the fact that once the hardware configuration is established; there should be no need for further hardware or wiring changes. If we want to establish are different control sequence for the system, all we need to do is specify 28 P a g e

different set microinstructions for control memory. The hardware configuration should not be changed for different operations; the only thing that must be changed is the microprogram residing in control memory. Microinstructions are stored in control memory in groups, with each group specifying routine. Each computer instruction has microprogram routine in control memory to generate the microoperations that execute the instruction. The hardware that controls the address sequencing of the control memory must be capable of sequencing the microinstructions within a routine and be to branch from one routine to another. The address sequencing capabilities required in a control memory are: 1. Incrementing of the control address register. 2. Unconditional branch or conditional branch, depending on status bit conditions. 3. A mapping process from the bits of the instruction to an address for control memory. 4. A facility for subroutine call and return. 29 P a g e

L = Load E = Copy to bus A,S = Add and Subtract Sign bit to control unit IP = Increment PC LDA = load accumulator STA = store accumulator ADD, SUB 3.3 Microcode Microcode tells the processor every detailed step required to execute each machine language instruction. Microcode is thus at an even more detailed level than machine language, and in fact defines the machine language. In a standard 30 P a g e

microprocessor, the microcode is stored in a ROM or a programmable logic array (PLA) that is part of the microprocessor chip and cannot be modified by the user.' 3.4 Microprogramming Model Store the microprogram in control store Fetch the instruction Get the set of control signals from the control word Move the microinstruction address Lather, Rinse, Repeat 3.5 Hardwired vs. Micro-programmed Computers It should be mentioned that most computers today are micro-programmed. The reason is basically one of flexibility. Once the control unit of a hard-wired computer is designed and built, it is virtually impossible to alter its architecture and instruction set. In the case of a microprogrammed computer, however, we can change the computer's instruction set simply by altering the microprogram stored in its control memory. In fact, taking our basic computer as an example, we notice that its four-bit op-code permits up to 16 instructions. Therefore, we could add seven more instructions to the instruction set by simply expanding its microprogram. To do this with the hard-wired version of our computer would require a complete redesign of the controller circuit hardware. Another advantage to using micro-programmed control is the fact that the task of designing the computer in the first place is simplified. The process of specifying the architecture and instruction set is now one of software (micro-programming) as opposed to hardware design. Nevertheless, for certain applications hard-wired computers are still used. If speed is a consideration, hard-wiring may be required since it is faster to have the hardware issue the required control signals than to have a "program" do it. 31 P a g e

A hardwired control unit has a processor that generates signals or instructions to be implemented in correct sequence. This was the older method of control that works through the use of distinct components, drums, a sequential circuit design, or flip chips. It is implemented using logic gates & flip flops. It is faster, less flexible & limited in complexity A micro programmed control unit on the other hand makes use of a micro sequencer from which instruction bits are decoded to be implemented. It acts as the device supervisor that controls the rest of the subsystems including arithmetic and logic units, registers, instruction registers, offchip input/output, and buses. It is slower, more flexible & greater complexity 3.6 Summary The characteristics of hardwired control units are as follows: - Hardwired control units are based on combinational circuits. - In these type of systems the inputs and transforms are set into control signals. - Theses units are faster and are known to have a more complex structure. Characteristics of micro programmed control units: - These control units are implemented as micro programs of routines. - The control unit implemented in micro program is implemented in the form of a CPU inside another CPU. - These types of circuits are simple but comparatively slower. 32 P a g e

3.7 Questions 1. What are the basic tasks performed by a micro programmed control unit? 2. What is the difference between a hard wired implementation and a micro programmed implementation of a control unit? 3. What are the advantages and disadvantages of Micro-programming and hard wired control unit?. 4. What is the relationship between instructions, micro operations and micro programming 5. What is a Data Path? 6. List the components of data path. 7. Define the following: (i) Micro code (ii) Micro instruction (iii) Micro operation (iv) Micro Program 33 P a g e

UNIT FOUR TRENDS IN COMPUTER ARCHITECTURE 4.0 Introduction Any discussion of computer architectures, of how computers and computer systems are organized, designed, and implemented, inevitably makes reference to the "von Neumann architecture" as a basis for comparison. And of course this is so, since virtually every electronic computer ever built has been rooted in this architecture. The name applied to it comes from John von Neumann, who as author of two papers in 1945 [Goldstine and von Neumann 1963, von Neumann 1981] and coauthor of a third paper in 1946 [Burks, et al. 1963] was the first to spell out the requirements for a general purpose electronic computer. The 1946 paper, written with Arthur W. Burks and Hermann H. Goldstine, was titled "Preliminary Discussion of the Logical Design of an Electronic Computing Instrument," and the ideas in it were to have a profound impact on the subsequent development of such machines. Von Neumann's design led eventually to the construction of the EDVAC computer in 1952. However, the first computer of this type to be actually constructed and operated was the Manchester Mark I, designed and built at Manchester University in England [Siewiorek, et al. 1982]. It ran its first program in 1948, executing it out of its 96 word memory. It executed an instruction in 1.2 milliseconds, which must have seemed phenomenal at the time. Using today's popular "MIPS" terminology (millions of instructions per second), it would be rated at.00083 MIPS. By contrast, some current supercomputers are rated at in excess of 1000 MIPS. And yet, these computers, such as the Cray systems and the Control Data Cyber 200 models, are still tied to the von Neumann architecture to a large extent. Over the years, a number of computers have been claimed to be "non-von Neumann," and many have been at least partially so. More and more emphasis is being put on the necessity for breaking away from this traditional architecture in order to achieve more usable and more productive systems. The expectations for the fifth generation systems seem to require that 34 P a g e

substantially new architectures be evolved, and that both hardware and software be freed from the limitations of the von Neumann architecture [Sharp 1985]. We all know what the von Neumann architecture is. At least we have strong intuitive feelings about it because this is what we have always used. This is "the way computers work." But to really comprehend what choices there are for computer designers, to appreciate what new choices must be found, it is necessary to have a more definitive understanding of what the von Neumann architecture is and is not and what its implications are. Von Neumann begins his "Preliminary Discussion" with a broad description of the generalpurpose computing machine containing four main "organs." These are identified as relating to arithmetic, memory, control, and connection with the human operator. In other words, the arithmetic logic unit, the control unit, the memory, and the input-output devices that we see in the classical model of what a computer "looks like." To von Neumann, the key to building a general purpose device was in its ability to store not only its data and the intermediate results of computation, but also to store the instructions, or orders, that brought about the computation. In a special purpose machine the computational procedure could be part of the hardware. In a general purpose one the instructions must be as changeable as the numbers they acted upon. Therefore, why not encode the instructions into numeric form and store instructions and data in the same memory? This frequently is viewed as the principal contribution provided by von Neumann's insight into the nature of what a computer should be. He then defined the control organ as that which would automatically execute the coded instructions stored in memory. Interestingly he says that the orders and data can reside in the same memory "if the machine can in some fashion distinguish a number from an order" [Burks, et al., p. 35]. And yet, there is no distinction between the two in memory. The control counter (what we now usually call the program counter) contains the address of the next instruction, and that word is fetched to be executed. Whatever the control unit "believes" to be an order or to be data is treated as such. One ramification of this is that the instructions can operate upon other instructions, producing a self-modifying program. This has not been considered good form for many years, 35 P a g e

because of the implications for program debugging and the desire for reentrant code in some situations. It is possible that new developments in artificial intelligence may bring fresh attention to the possibilities afforded by this characteristic [Bishop 1986]. Von Neumann devoted most of his "Preliminary Discussion" to the design of the arithmetic unit. The details of this are the least interesting part of the paper from the standpoint of the organization of his computer, and its influence on future developments. The capabilities of the arithmetic unit were limited to the performance of some arbitrary subset of the possible arithmetic operations. He observes that "the inner economy of the arithmetic unit is determined by a compromise between the desire for speed of operation...and the desire for simplicity, or cheapness, of the machine" [Burks, et al., p. 36]. What is interesting, and important, is that this issue continued to dominate design decisions for many years. It is less true now that hardware costs have become a substantially less critical concern. The concepts put forth by von Neumann were, for their time, quite remarkable--so much so that they provided the foundations for all of the early computers developed, and for the most part are still with us today. Why then do we require something different? What is it about this architecture that we find constraining and counterproductive? Why must there be new, different, "non-von Neumann" machines? Myers [1982] defines four properties that characterize the von Neumann architecture, all of which he feels are antithetical to today's needs. One of these was discussed above, that is the fact that instructions and data are distinguished only implicitly through usage. As he points out, the higher level languages currently used for programming make a clear distinction between the instructions and the data and have no provision for executing data or using instructions as data. A second property is that the memory is a single memory, sequentially addressed. A third, which is really a consequence of the previous property, is that the memory is one-dimensional. Again, these are in conflict with our programming languages. Most of the resulting program, therefore, is generated to provide for the mapping of multidimensional data onto the one dimensioned memory and to contend with the placement of all of the data into the same memory. 36 P a g e

Finally, the fourth property is that the meaning of the data is not stored with it. In other words, it is not possible to tell by looking at a set of bits whether that set of bits represents an integer, a floating point number or a character string. In a higher level language, we associate such a meaning with the data, and expect a generic operation to take on a meaning determined by the meaning of its operands. In characterizing the difficulties presented by these four properties, their inconsistencies with higher level languages are emphasized. And yet, these higher level languages were, for the most part, designed to be utilized for the purpose of programming the existing von Neumann style computers. In a very real sense, the entire realm of software has been developed under the umbrella of this architecture and may be aptly referred to as von Neumann software. Thus, the hardware and software have served to perpetuate each other according to the underlying von Neumann model. One facet of this is the fundamental view of memory as a "word at a time" kind of device. A word is transferred from memory to the CPU or from the CPU to memory. All of the data, the names (locations) of the data, the operations to be performed on the data, must travel between memory and CPU a word at a time. Backus [1978] calls this the "von Neumann bottleneck." As he points out, this bottleneck is not only a physical limitation, but has served also as an "intellectual bottleneck" in limiting the way we think about computation and how to program it. Obviously, the computers we use today are not simply larger, faster EDVACs. Numerous improvements have been made through the introduction of, for example: index registers and general purpose registers; floating point data representation; indirect addressing; hardware interrupts to redirect program execution upon detection of certain events; input and output in parallel with CPU execution; virtual memory; and the use of multiple processors [Myers 1982]. It is significant that each of these made some improvement over the original model, some of them quite significant improvements, but none fundamentally changed the architecture. In other words, the four properties discussed above still hold, and the von Neumann bottleneck still exists with or without these improvements. It is also significant to note that all of these improvements were implemented before 1960! 37 P a g e