Printer Disk. Modem. Computer. Mouse. Tape. Display. I/O Devices. Keyboard

Similar documents

OUTLINE SYSTEM-ON-CHIP DESIGN. GETTING STARTED WITH VHDL August 31, 2015 GAJSKI S Y-CHART (1983) TOP-DOWN DESIGN (1)

Words Symbols Diagram. abcde. a + b + c + d + e

1. Definition, Basic concepts, Types 2. Addition and Subtraction of Matrices 3. Scalar Multiplication 4. Assignment and answer key 5.

Homework 3 Solutions

CS99S Laboratory 2 Preparation Copyright W. J. Dally 2001 October 1, 2001

Chapter. Contents: A Constructing decimal numbers

1 Fractions from an advanced point of view

Student Access to Virtual Desktops from personally owned Windows computers

SECTION 7-2 Law of Cosines

Active Directory Service

Ratio and Proportion

How To Balance Power In A Distribution System

Learning Outcomes. Computer Systems - Architecture Lecture 4 - Boolean Logic. What is Logic? Boolean Logic 10/28/2010

CS 316: Gates and Logic

Quick Guide to Lisp Implementation

How To Organize A Meeting On Gotomeeting

Lec 2: Gates and Logic

Fundamentals of Cellular Networks

1 GSW IPv4 Addressing

VMware Horizon FLEX Administration Guide

SOLVING EQUATIONS BY FACTORING

Reasoning to Solve Equations and Inequalities

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Friday 16 th May Time: 14:00 16:00

Calculating Principal Strains using a Rectangular Strain Gage Rosette

The remaining two sides of the right triangle are called the legs of the right triangle.

Arc-Consistency for Non-Binary Dynamic CSPs

McAfee Network Security Platform

Module 5. Three-phase AC Circuits. Version 2 EE IIT, Kharagpur

- DAY 1 - Website Design and Project Planning

Enterprise Digital Signage Create a New Sign

VMware Horizon FLEX Administration Guide

c b N/m 2 (0.120 m m 3 ), = J. W total = W a b + W b c 2.00

Data Security 1. 1 What is the function of the Jump instruction? 2 What are the main parts of the virus code? 3 What is the last act of the virus?

Regular Sets and Expressions

5 a LAN 6 a gateway 7 a modem

GENERAL OPERATING PRINCIPLES

BUSINESS PROCESS MODEL TRANSFORMATION ISSUES The top 7 adversaries encountered at defining model transformations

SOLVING QUADRATIC EQUATIONS BY FACTORING

MATH PLACEMENT REVIEW GUIDE

Maximum area of polygon

Inter-domain Routing

Chapter. Fractions. Contents: A Representing fractions

Binary Representation of Numbers Autar Kaw

European Convention on Products Liability in regard to Personal Injury and Death

PLWAP Sequential Mining: Open Source Code

Innovation in Software Development Process by Introducing Toyota Production System

Algebra Review. How well do you remember your algebra?

Example 27.1 Draw a Venn diagram to show the relationship between counting numbers, whole numbers, integers, and rational numbers.

Vectors Summary. Projection vector AC = ( Shortest distance from B to line A C D [OR = where m1. and m

Seeking Equilibrium: Demand and Supply

The Cat in the Hat. by Dr. Seuss. A a. B b. A a. Rich Vocabulary. Learning Ab Rhyming

A.7.1 Trigonometric interpretation of dot product A.7.2 Geometric interpretation of dot product

Angles 2.1. Exercise Find the size of the lettered angles. Give reasons for your answers. a) b) c) Example

EQUATIONS OF LINES AND PLANES

Review. Scan Conversion. Rasterizing Polygons. Rasterizing Polygons. Triangularization. Convex Shapes. Utah School of Computing Spring 2013

2 DIODE CLIPPING and CLAMPING CIRCUITS

LISTENING COMPREHENSION

A System Context-Aware Approach for Battery Lifetime Prediction in Smart Phones

A Language-Neutral Representation of Temporal Information

Or more simply put, when adding or subtracting quantities, their uncertainties add.

Lesson 2.1 Inductive Reasoning

INSTALLATION, OPERATION & MAINTENANCE

European Convention on Social and Medical Assistance

Radius of the Earth - Radii Used in Geodesy James R. Clynch Naval Postgraduate School, 2002

Start Here. Quick Setup Guide. the machine and check the components. NOTE Not all models are available in all countries.

Cell Breathing Techniques for Load Balancing in Wireless LANs

How To Network A Smll Business

Math Review for Algebra and Precalculus

P.3 Polynomials and Factoring. P.3 an 1. Polynomial STUDY TIP. Example 1 Writing Polynomials in Standard Form. What you should learn

Towards Zero-Overhead Static and Adaptive Indexing in Hadoop

SE3BB4: Software Design III Concurrent System Design. Sample Solutions to Assignment 1

If two triangles are perspective from a point, then they are also perspective from a line.

How To Find The Re Of Tringle

Small Businesses Decisions to Offer Health Insurance to Employees

One Minute To Learn Programming: Finite Automata

Operations with Polynomials

LINEAR TRANSFORMATIONS AND THEIR REPRESENTING MATRICES

Practice Test 2. a. 12 kn b. 17 kn c. 13 kn d. 5.0 kn e. 49 kn

KEY SKILLS INFORMATION TECHNOLOGY Level 3. Question Paper. 29 January 9 February 2001

DiaGen: A Generator for Diagram Editors Based on a Hypergraph Model

Architecture and Data Flows Reference Guide

Use Geometry Expressions to create a more complex locus of points. Find evidence for equivalence using Geometry Expressions.

Polynomial Functions. Polynomial functions in one variable can be written in expanded form as ( )

Small Business Networking

7 mm Diameter Miniature Cermet Trimmer

The art of Paperarchitecture (PA). MANUAL

Appendix D: Completing the Square and the Quadratic Formula. In Appendix A, two special cases of expanding brackets were considered:

Orthodontic marketing through social media networks: The patient and practitioner s perspective

Welch Allyn CardioPerfect Workstation Installation Guide

Transcription:

CS224 COMPUTER ARCHITECTURE & ORGANIZATION SPRING 204 LAYERED COMPUTER DESIGN. Introdution CS224 fouses on omputer design. It uses the top-down, lyered, pproh to design nd lso to improve omputers. A omputer is designed lyer y lyer, strting with the top oneptul lyer nd ending with the ottom lyer, the omplete design. A lyer is implemented y the lyer elow. There re mny possiilities to implement lyer. To deide, design gols re used : speed, ost, size, weight, power onsumption, reliility, expendility, flexiility nd omptiility. CS224 onentrtes on the speed (performne) gol nd desries how it n e used to mke deisions on the design of omputer nmed EMY. It will e onluded often tht the est (most relile) omputer speed mesure is the run (exeution) time. Applitions to Trnsistors Computer Sientist Algorithm designer Progrmmer Chip designer Applition Level Computtionl Method Level Algorithm Level High-level Lnguge Level Systems personnel & OS proesses, memory, I/O, file,... mngers Operting System Level Computer rhitet & Compiler, linker, loder HW/SW Interfe Computer designer Logi designer Arhiteture Level (Mhine Lnguge Level) Mirorhiteture Level (Orgniztion Level) (Register Trnsfer Level, RTL) Logi Level Trnsistor Level - Simulting plne, word proessing, ontrolling n elevtor,.. - Aeroplne surfe, winds, sheets, hrters, elevtor uttons,... - Astrt mhine - Astrt mhine - Astrt mhine - Astrt mhine - Astrt mhine - Add, sutrt, multiply, lod, store, jump, rnh,.. - 2 s omplement integers, FP numers, vetors,... - Registers, memory, ddressing, I/O, interrupts : Astrt mhine - Feth instrution, inrement PC, lulte effetive ddress,.. - 2 s omplement numers, memory ddresses,... - CPU (registers, uses, ALUs), memory, I/O : digitl systems - AND, OR, NOT, loked store on flip-flop,.. - Bits - Gtes (AND, OR, NOT) nd flip-flops : digitl iruits - Swith on nd off - Voltge levels - Swithes (trnsistors), resistors, pitors nd wires : eletroni iruits Softwre Hrdwre Computer rhiteture ourses over pplition, rhiteture, orgniztion, logi nd trnsistor lyers. But, four other lyers etween the pplition nd rhiteture lyers need to e studied s well for omprehensive design (see the figure ove). To sve time, CS224 will lso not onsider these four lyers muh. In the figure ove, who/ wht implements lyer is indited on the left side. Some lyers re shown with three text lines on the right side. The first text line indites typil opertions of the lyer, the seond text line indites typil opernds of the lyer nd the third text line indites typil omponents of the lyer. Finlly, it must e noted tht ultimtely omputer omputes y mens of its trnsistors turning (swithing) on nd off : The trnsistor lyer is the omplete design! NYU Shool of Engineering Pge of 24 Hndout No : 3 Jnury 27, 204

In Chpters 2, 3 nd Appendix A, the rhiteture lyer of the EMY omputer is introdued. The rhiteture is finlized when Input/Output nd interrupts re overed in Chpter 8. In hpters 3, 5, 8 (eight) nd Appendies B nd C the orgniztion nd logi levels of the EMY omputer re presented. Improvements to the orgniztion in the form of pipelining nd memory hierrhy re introdued in Chpters 6 nd 7, respetively. All nine lyers shown in the figure ove re summrized elow. However, efore we disuss the lyers, we will disuss omputer fundmentls, then populr omputer lssifitions nd then onlude tht the nine lyers ove give etter view of omputers... Computer Systems The fundmentls : A omputer proesses digitl informtion. In order to do tht it runs (exeutes) mhine lnguge progrm. As n exmple, when we uy softwre, suh s the Mirosoft Word, we uy the mhine lnguge progrm of the Word softwre. A mhine lnguge progrm mnipultes dt. A mhine lnguge progrm onsists of mhine lnguge instrutions. A mhine lnguge instrution is simple ommnd tht n e immeditely understood y the hrdwre. It ommnds the omputer to perform simple opertion suh s dd, sutrt, nd, shift left, et. Thus, it n e diretly run y the omputer (hrdwre). Mhine lnguge instrutions nd dt re in terms of s nd 0s nd re stored in the memory. It is not possile to distinguish whether prt of the memory hs n instrution or dt element y just looking t it. This is unique property of tody s omputers nd so re lled stored-progrm omputers. Dt nd progrms re input from input/output (I/O) devies into the omputer memory nd result dt re output to the I/O devies. Printer Disk Mouse Computer Modem Tpe Disply Keyord I/O Devies Computers re lssified with respet to their size, speed nd ost s superomputers, servers, desktop omputers nd emedded omputers. Superomputers re the fstest omputers, osting millions of dollrs nd very lrge. They re used for sientifi pplitions, suh s irplne design, wether foresting, moleulr simultions. Government genies nd lrge orportions n fford them. Servers re lrge omputers tht llow multiple users to run generl-purpose pplitions. Compnies nd universities re typil ustomers. CS224 will onentrte on server lss omputers. Desktop omputers re single-user mhines, intended to run smll numers pplitions rnging from emil to word proessing. Emedded omputers re very smll nd ontrol system they re emedded in. They typilly hve one pplition to run whih is the ontrol of the system they re in..2. Hrdwre vs. Softwre Another lssifition is hrdwre vs. softwre. Hrdwre is the olletion of physil omponents, suh s hips, wires, PCBs, onnetors, I/O devies, et. tht form omputer Softwre is the olletion of progrms on omputer. Softwre nd Hrdwre re equivlent in tht ny opertion performed y the hrdwre n e uilt into softwre nd ny opertion performed y softwre n e lso diretly relized y hrdwre. Therefore, we hve the hrdwre/softwre trde-off. This equivlene is under the ssumption tht there is si set of opertions implemented in hrdwre. Deisions on wht to inlude in hrdwre nd softwre re sed on the required speed, ost, reliility, frequeny of expeted hnges, et. NYU Shool of Engineering Pge 2 of 24 CS224 Hndout No : 3 Jnury 27, 204

There re two types of softwre tody : Applition nd systems. The mening of the two hnges omputer to omputer. Sine we onentrte on lrge omputers, servers, in CS224, we define pplition progrms s those run y ordinry users, suh s emil, word proessing, spredsheet, simultion progrms, et. Systems progrms re used to ontrol the hrdwre to mke the omputer esy to use, seure nd more effiient. Systems softwre inlude the operting systems, lnguge trnsltors (ompilers, ssemlers), linkers, loders, lirries. They re used y systems people who hve speil privileges (ess rights) to use the omputer. This distintion is enfored y tody s omputers in the form of hrdwre ontrol sttes : user nd system sttes. Applition progrms re run in the user stte nd if they try to run system softwre in this mode n interrupt (exeption) is generted The progrm is terminted. System progrms re run in the system stte. Even though softwre is in mhine lnguge, tody it is often developed y first writing in high-level lnguge or n pplition-oriented lnguge or in ssemly lnguge. High-level lnguges inlude C++, Jv, C, Fortrn, Cool, Python, PHP, et. Applition-oriented lnguges ontin onstruts nd keywords to develop progrm for speifi lss of pplitions, suh s simulting omputer network. Assemly lnguges re relted to the rhiteture of the proessor they re trgeted for. Tht is, for omputer with n Intel Pentium proessor, one would develop n ssemly lnguge progrm in the Intel ssemly lnguge. If the proessor is n IBM Power proessor, one would write n IBM Power ssemly lnguge progrm. Sine the omputer n run only mhine lnguge progrms, one needs to trnslte the ove progrms to mhine lnguge progrms. To trnslte from high-level lnguge progrm to the mhine lnguge progrm, ompilers re used : C++ ompiler, Jv ompiler, et. To trnslte from n ssemly lnguge progrm to the mhine lnguge progrm, ssemlers re used : Intel ssemler, IBM ssemler, et. To trnslte from n pplition-oriented lnguge progrm to the mhine lnguge progrm, typilly preproessing progrms re used to onvert to n intermedite form in high-level lnguge nd then they re ompiled to the mhine lnguge progrm. Among the three types of lnguges, pplition-oriented lnguges re the highest level, mening very esy to write nd ssemly lnguges re the lowest, mening hrdest to write. Although it is esier to develop pplition oriented lnguge progrms, their orresponding mhine ode my not e effiient sine preproessors nd ompilers my not e sophistited enough to generte n effiient mhine ode. On the other hnd, developing lrge ssemly lnguge progrm my not e prtil due to the omplexity of the lnguge. The ommon prtie tody is tht for emedded pplitions ssemly nd C progrms re developed sine emedded progrms re not lrge. For ll others high-level nd pplition-oriented lnguges re used. In order to speed up the ssemly lnguge progrm development, progrmmers use pseudoinstrutions. A pseudo instrution is not rel instrution. The CPU nnot exeute it. It often requires omplex rhiteturl opertion nd if it ws n tul instrution, it would e omplex (CISC) instrution. In other ses, pseudo instrution my require simple rhiteturl opertion nd would e one simple RISC instrution. Why one would hve it s pseudo instrution is then tht its syntx is more onvenient, i.e. esier to lern nd use, s in the se of MOVE nd CLEAR pseudoinstrutions sked in Homework I. Severl MIPS pseudo instrutions re given in the textook, suh s BLT (Brnh Less Thn) nd LI (Lod Immedite). Pseudoinstrutions used y the progrmmer re onverted to rel instrutions y the ssemler..3. Arhiteture vs. Orgniztion (Mirorhiteture) Another omputer lssifition is rhiteture vs. orgniztion (mirorhiteture). The rhiteture is the set of resoures visile to the mhine lnguge progrmmer : Registers, the memory, dt representtions, ddressing modes, instrutions formts, ontrol sttes, I/O ontrollers, interrupts, et. Although often the rhiteture is thought to e equivlent to the mhine lnguge set of omputer, it is more thn tht. Still, mjor portion of the rhiteture overge is devoted the mhine lnguge set. A relted issue in the pst ws whether the mhine lnguge set should e omplex (omplex instrution set omputer, CISC) or simple (redued instrution set omputer, RISC). The dete took ple in the 980s nd first hlf of the 990s. It ws resolved s the RISC the winner sine it llows more effiient pipelining, leds to simpler hrdwre nd esier inrese of the lok frequeny. The Intel nd Motorol mhine lnguge sets re CISC. The Sun is RISC. Why nd how the Intel CISC rhiteture hs kept its domi- NYU Shool of Engineering Pge 3 of 24 CS224 Hndout No : 3 Jnury 27, 204

nne will e ler lter in the semester. But, simply, this hs een possile y designing n Intel CPU tht onverts eh Intel CISC instrution to up to three RISC instrutions on the fly. Studying the rhiteture implies working on mhine lnguge progrms. But, this is not prtil when we design omputer sine mhine lnguge progrms hve s nd 0s. Therefore, in CS224, we will work on mnemoni mhine lnguge progrms. They re esier to write nd in one-to-one orrespondene with mhine lnguge progrms. Tht is, if one hs mnemoni mhine lnguge progrm, it is very strightforwrd to otin the orresponding mhine lnguge progrm. Note tht there is often no one-to-one orrespondene etween ssemly lnguge progrms nd mhine lnguge progrms. The orgniztion is the set of resoures tht relizes the rhiteture whih inlude the CPU, the memory nd I/O ontrollers. These re digitl systems with registers, uses, ALUs, sequeners, et. The CPU is responsile for running mhine lnguge progrms : It runs mhine lnguge instrutions. Running mhine lnguge instrution is performing simple opertion (ommnd) on dt. The memory keeps the progrms nd dt, leding to the storedprogrm onept of tody s omputers. I/O ontrollers interfe the I/O devies to the memory nd CPU. An I/O ontroller n ontrol one or more I/O devies. Often the numer of I/O devies onneted to n I/O ontroller depends on the speed of I/O devies. A high speed I/O devie n e ontrolled y single I/O ontroller while few slow speed I/O devies n e ontrolled y single I/O ontroller. The stored-progrm onept nd the generi view of omputer orgniztion with t lest three digitl systems (the CPU, memory nd I/O ontroller) re often ttriuted to mthemtiin John Von Neumnn. However, there is onsiderle dete on tht. CPU Memory I/O Controller I/O Controller I/O Controller Disk Disply Keyord A miroproessor ontins t the lest the CPU whih ws the se in the 970s nd erly 980s. Tody they inlude he memories, us interfes, memory mngement units. High-performne miroproessors from Intel, AMD, Sun, IBM hve these funtionl units. Some other hips in the mrket tody ontin memory nd even I/O ontrollers. These re used for emedded pplitions nd lled miroontrollers, not miroproessors. The reson why the memory nd I/O ontrollers re dded is tht emedded omputers re often required to oupy smll spe in the system they re housed in. To redue the hip ount, hene the physil spe, this pproh is needed. As the ove disussion indites looking t omputer from different points of view n e t lest distrtive, if not onfusing for eginners of omputer design : hrdwre vs. softwre, different progrmming lnguges, operting systems, ompilers, ssemlers, rhiteture vs. orgniztion, et. Tht is why the onept of omputer lyers is used to give omprehensive view of omputers t different omplexities or strtion. Astrtion llows reduing the numer of detils of lyer with simpler view. In the omputer lyers figure on the first pge, lyer is strted y the lyer just ove it. 2. Computer Lyers The Applition, Computtionl Method, High-Level Lnguge, Operting Systems nd Arhiteture lyers onstitute the softwre lyers. The Arhiteture, Mirorhiteture, Logi nd Trnsistor lyers onstitute the hrdwre lyers. Eh lyer, exept the Applition lyer, implements the lyer ove, following the onept of strtion Clerly, the Arhiteture is the hrdwre/softwre interfe. A omputer rhitet needs to hndle oth hrdwre nd softwre nd keep trk of dvnes in oth. NYU Shool of Engineering Pge 4 of 24 CS224 Hndout No : 3 Jnury 27, 204

2.. Applition Lyer : This lyer indites the set of pplitions intended for the omputer! Idelly, ll pplitions n e run on omputer. However, in prtie the omputer is designed to effiiently run suset of them. For exmple, omputer runs sientifi pplitions, different omputer runs usiness pplitions, et. Our EMY omputer will trget sientifi pplitions. Speifi pplitions mentioned in the textook re enhmrk suites Linpk, Livermore Loops, Whetstone, Dhrystone, SPEC CPU 2000, SPECWe nd EDN EEMBC (Emedded Miroproessor Benhmrk Consortium enhmrk of five lsses of pplitions). 2.2. Computtionl Methods Lyer : This lyer is highly theoretil nd strt. The omputtionl method (i) determines hrteristis of items (dt nd other) nd work (opertions), ii) desries how opertions initite eh other during exeution, i.e. whih opertion is followed y whih or determining the order of performing opertions, nd (iii) impliitly determines the mount of prllelism mong the opertions. Three types of omputtionl methods re frequently overed in the disussion of this topi : ontrol flow, dt flow nd demnd driven. Tody s omputers use the ontrol flow omputtionl method where the order of opertions is speified y the order of instrutions in the progrm. The order implies the exeution order nd so next instrution to perform is the one tht follows the urrent instrution in the progrm. If one wnts to hnge the order of exeution, expliit ontrol instrutions (rnh, jump, et.) must e used, hene the nme ontrol flow. This expliit sequene of opertions osures prllelism. Thus, the ontrol-flow is inherently sequentil, hindering prllelism nd higher speeds. This is the reson why tody s superomputers re very expensive s they need omplex ompilers, operting systems, hrdwre nd highly trined prllel lgorithm designers nd progrmmers to extrt prllelism from sequentil progrms. In dt flow, n opertion strts its exeution when ll of its opernds re ville. Sine the opernd vilility determines the order of opertions, this method is lso lled dt driven. Mny opertions n hve their opernds redy t the sme nd so they n strt exeution t the sme. Thus, dt flow does not hinder prllelism. In ft, the prllelism is expliit to the fullest extend. In demnd driven, n opertion strts when its result is demnded. Mny opertion results n e demnded t the sme nd so they n ll strt exeution in prllel. Demnd driven omputtion lso hs prllelism expliit. Overll, dt-flow nd demnd driven methods re inherently prllel. However, to implement them in full sle tody is not effiient given the urrent tehnology. 2.3. Algorithm Lyer : The lgorithm for n pplition speifies mjor steps to generte the output. The lgorithm follows the omputtionl method hosen. An lgorithm is strt nd short. It is independent of high-level lnguges. Tody, for single-proessor (uniproessor) omputer suh s the EMY proessor, we write sequentil lgorithm in the ontrolflow method. However, if we hve omputer with multiple proessors (ores), we write prllel lgorithm ut still use the ontrol flow method. Applition : Dot Produt dot = A * B ==> n dot = A[i] B[i] i= A, B re vetors with n elements Algorithm : dot = 0 for ( <= i <= n) do dot = dot + (A[i] * B[i]) Sequentil lgorithm Cn you prllelize it? Applition : SAXPY/DAXPY, step in Gussin elimintion to solve liner equtions NYU Shool of Engineering Pge 5 of 24 CS224 Hndout No : 3 Jnury 27, 204

Y = * X + Y ==> Y[i] = X[i] + Y[i] X, Y re vetors with n elements & is slr Algorithm : for ( <= i <= n) do Y[i] = * X[i] + Y[i] Sequentil lgorithm Cn you prllelize it? Applition : Mtrix Multiply A = B * C ==> n A[i,j] = B[i,k] C[k,j] k= A m x p Algorithm : for ( <= i <= m) do for ( <= j <= p) do A[i,j] = 0 for ( <= k <= n) do A[i,j] = A[i,j] + B[i,k] * C[k,j] A is n m x p mtrix B is n m x n mtrix C is n n x p mtrix i j = i B m x n * C j n x p A[i,j] is the dot produt of row i of B nd olumn j of C Sequentil lgorithm Cn you prllelize it? 2.4. High-Level Lnguge Lyer : The lgorithm developed for n pplition is oded in high-level lnguge, suh s Fortrn, C, C++, Jv, et. Fortrn is still the hoie of sientifi omputing, while C is gining ground. Note tht for n lgorithm there re different progrms possile s eh n e in different high-level lnguge. 2.5. Operting Systems Lyer : This lyer interfes with hrdwre. Tht is, it hides hrdwre detils from the progrmmer nd provides, seurity, stility nd firness in the omputing system. Thus, this lyer dds more ode to run on the ehlf of the pplition. The lyer lso hndles interrupts nd input/output opertions. 2.6. Arhiteture Lyer : The rhiteture lyer is the hrdwre/softwre interfe. Its elements inlude the mhine lnguge instrution set, register sets, the memory nd Input/Output strutures mong others. CS224 disusses this level onsiderly, strting with the seond week of the semester, nd so its desription here is kept short. Nevertheless, elow we give rief disussion of dilemm tht omputer rhitets hd in the 980s nd erly 990s. The disussion is given due to its historil signifine : The dilemm : on the one side, omputer rhitet would like to inlude omplex instrutions (floting-point division, string serh, et.) in the instrution set to perform omplex opertions diretly. The opposite deision is not to inlude omplex instrutions : only simple instrutions. A missing omplex opertion is implemented y piee of ode. Oviously, running ode tkes longer time thn running single omplex instrution, so the simpler mhine would e slower thn the omplex mhine. But, omplex rhiteture results in omplex hrdwre with higher osts, longer development times nd diffiult upgrding. A simple rhiteture leds to simpler hrdwre. But, we NYU Shool of Engineering Pge 6 of 24 CS224 Hndout No : 3 Jnury 27, 204

need to use sophistited ompiler to generte n effiient ode sine n pplition with omplex opertions hs to e implemented y piees of ode. One n see tht the simpler omputer is slow omputer when those pplitions re run. We must note tht oth deisions re ttrtive : fst mhine or heper mhine. The fundmentl question is this : Are those omplex opertions needed often? In other words, re those funtions exeuted often? If often, omplex rhiteture is justified : we must mke the ommon se fst. The division : There hve een two mps in omputer hrdwre tht promote two different omputer rhiteture philosophies : A Complex Instrution Set Computer, CISC nd Redued (simple) Instrution Set Computer, RISC. Exmples of highly CISC miroproessors re the Intel x86 nd Motorol 680X0. Exmples of highly RISC miroproessors re MIPS nd Sun UltrSPARC. The ompromise hs een tried in the form of hyrid miroproessors. Highly RISC miroproessors re dded CISC fetures when upgrded. Similrly, highly CISC miroproessors re dded RISC fetures when upgrded. An exmple of hyrid miroproessor is the IBM PowerPC miroproessor, whih is nevertheless mrketed s RISC miroproessor. Currently, the RISC ide is the fvorite sine it llows effiient pipelining. Even, the Intel x86 rhiteture relies on RISC exeution : Eh x86 CISC instrution is onverted to up to three RISC opertions in the ID yle nd these three RISC opertions re exeuted in the rest of the CPU hrdwre s if they re in the instrution set. To summrize : ) There is lwys speed/ost trde-off where the higher the speed, the higher the ost. ) The design of omputer (designing its rhiteture, orgniztion, logi nd hip levels) is sed on the trgeted pplition set. Tht is, we hoose the pplition set for our omputer, then design the hrdwre. ) Mking the ommon se fst is n ttrtive design rule. It is mentioned s one of the eight gret ides in omputer rhiteture on pge of the textook. See lso pges 59, 60, 6, 62 nd 63 of the textook. When textook prolems re nlyzed, one n see how omputer rhitets would idelly design the rhiteture of omputer : A numer of rel, ommonly used progrms re run nd lrge set of sttistis is otined, suh s how often eh instrution is exeuted, whih registers re used, et. From the sttistis, we deide out whih instrutions we hve to inlude in the instrution set, how mny registers in the register set, types of ddressing modes, types of dt representtions, et. so tht we mke the ommon se fst! We lso determine how time onsuming it would e if some opertions were not implemented y instrutions (y hrdwre), ut y softwre (y funtions). Clerly, the design fouses on pplition-rhiteture intertions suh tht the rhiteture is tuned to the pplitions. For exmple, our EMY omputer is tuned to sientifi pplitions! Conentrting on one lyer t time (suh s the rhiteture lyer) is ttrtive from omputer rhiteture edution point of view. This pproh is not ttrtive prtilly. Beuse the resulting omputer n violte one or more of the design gols (speed, ost, size, power onsumption,...). For exmple, the omputer n e too expensive or too lrge. Thus, in prtie omputer designers work on severl levels simultneously, even though they proeed top-down. When n rhiteturl deision is mde, its implitions on lower levels re exmined. If it is onluded tht prtiulr rhiteturl deision n violte gol, it is ndoned. For exmple, if it is deided to llow word oundry rossings, on the rhiteture level, we hek its implitions on the orgniztion level. We might relize tht the orresponding hrdwre is unneessrily too expensive, so we reverse the deision out word oundry rossings. 2.7. Mirorhiteture Lyer : This lyer onsists of digitl systems. A omputer whih is digitl system onsists of t lest three smller digitl systems : the proessor (CPU), the memory nd Input/Output ontroller. A digitl system onsists of registers, uses, ALUs, sequeners, et. Other nmes used for this lyer re orgniztion nd register trnsfer level (RTL). The mirorhiteture lyer is lso disussed in depth in CS224, strting with the fourth week of the semester. Hndouts will e distriuted during the semester to ensure students understnd fundmentl mirorhiteture onepts. NYU Shool of Engineering Pge 7 of 24 CS224 Hndout No : 3 Jnury 27, 204

2.8. Logi Lyer : This lyer onsists of digitl iruits. Digitl iruits form digitl systems of the mirorhiteture level. Digitl iruits use two types of omponents : gtes nd flip-flops. A gte outputs or 0, depending on its urrent input vlues, i.e. the output now is funtion of the inputs now. Most ommon gtes used re AND, OR, NOT, NAND nd NOR gtes. A flip-flip stores single it. To store the it ( or 0), lok signl is used. The rising or flling edge of the lok stores the it. Most ommon flip-flops used re D nd JK flip-flops. A flip-flop is implemented y using few gtes. Then, we n stte tht ll digitl iruits onsist of gtes! Note tht flip-flop is not memory. The memory hip design is different from the flip-flop design. There re two types of digitl iruits. A omintionl iruit ontins gtes. A omintionl iruit hnges its output right fter n input is hnged : the output now is funtion of the inputs now. Comintionl iruits nnot store informtion. Exmples of omintionl iruits re dders, multipliers, omprtors, et. Sequentil iruits ontin gtes nd flip-flops. They store pst inputs : the output now is funtion of inputs now nd pst inputs. Exmples of sequentil iruits re ounters, registers, shift registers, sequeners. The Logi lyer will e disussed less thn the rhiteture nd mirorhiteture lyers in CS224. It will e the min topi of interest when we over hrdwiring, miroprogrmming nd high-speed rithmeti iruits. Below, we give rief introdution to digitl logi. 2.8.. Introdution to Digitl Logi In this setion, we present forml digitl iruit fundmentls needed to implement suh strutures s registers, uses, ALUs nd sequeners. Digitl iruits onsist of gtes nd flip-flops. There re two types of digitl iruits : Comintionl iruits nd sequentil iruits. Comintionl iruits use only gtes while sequentil iruits use oth gtes nd flip-flops. Most rel life iruits re sequentil iruits. Comintionl iruits re more speifi purpose. The input-output reltionship of digitl iruit is importnt when it is studied. The input-output reltionship trets the digitl iruit s lk ox with inputs nd outputs. It reltes the output to the inputs : every output is desried s funtion of the inputs. A funtion is mthemtil entity tht preisely desries how n output is determined y its inputs. For exmple, in the figure elow, the digitl iruit shown s lk ox hs three inputs nd two outputs (two digitl funtions) : Digitl Ciruit y = f (,, ) z = f 2 (,, ) Funtions f nd f 2 preisely indite when outputs y nd z re nd when they re 0. Tht is, f speifies the inputoutput reltionship of y to inputs, nd nd f 2 speifies the input-output reltionship of z to inputs, nd. A funtion for simple omintionl iruit is represented y expressions, minterm lists, truth tles, et. For omplex omintionl iruits, we do not use funtions, ut opertion tles. A funtion for simple sequentil iruit is represented y expressions, stte tles nd stte digrms. For omplex sequentil iruits we do not use funtions, ut opertion digrms. Due to historil resons, digitl iruits re lled swithing iruits, digitl iruit funtions re lled swithing funtions nd the lger to design omintionl iruits is lled Swithing Alger. Below, we first disuss omintionl iruits nd then sequentil iruits. 2.8.2. Comintionl Ciruits A omintionl iruit hnges its output when its inputs re hnged. This is euse it uses gtes whih hnge their outputs when their inputs re hnged. The time it tkes for the gte output to hnge is severl hundred pio seonds tody. Therefore, omintionl iruit will hnge its output in time durtion in terms of nno seonds. There is no time dimension in omintionl iruits, sine they hnge their outputs if inputs re hnged. NYU Shool of Engineering Pge 8 of 24 CS224 Hndout No : 3 Jnury 27, 204

A numer of omintionl iruits re frequently used in digitl systems : Multiplexers, deoders, enoders, demultiplexers, dders, omprtors nd prity hekers. Their design hs een studied extensively in literture. Students n tke look t ooks on digitl logi to understnd their opertion. Typil omintionl design follows the rule of thum tht if it hs 4 inputs or less it is immeditely designed y using Swithing Alger. Otherwise, the iruit is prtitioned into smller nd smller loks until they hve 4 inputs or less or until they re populr digitl iruits nd there re ville designs for them. 2.8.2.. Swithing Alger George Boole, in 854, introdued systemti tretment of logi nd developed for this purpose n lgeri system now lled Boolen Alger. This lgeri system, (S ; + ;. ; - ; 0 ; ), onsists of set S of elements, inry opertions + (lled plus ) nd. (lled dot ) nd unry opertion - (lled omplement ) nd t lest two elements 0 nd in the set S. E. V. Huntington in 904 defined mny-vlued Boolen Alger with the following postultes (xioms) : PI) There exists set S of elements whih stisfies the priniple of sustitution under the equivlene reltion = tht if k = m, then m my e sustituted for k in ny expression ontining k, without ffeting the vlidity of the expression. PII) ) The set S is losed with respet to the + opertion : PII) ) The set S is losed with respet to the. opertion : if k, m Œ S, then (k + m) if k, m Œ S, then (k + m) PIII) k + 0 = k 0 is the identity element with respet to + PIII) k. = k is the identity element with respet to. Œ S Œ S PIV) k + m = m + k PIV) k. m = m. k + is ommuttive. is ommuttive PV) k + (m. p) = (k + m). (k + p) + is distriutive over. PV) k. (m + p) = (k. m) + (k. p). is distriutive over + PVI) For every element k of set S, there exists n element k (omplement of k) of set S, suh tht k + (k) = PVI) For every element k of set S, there exists n element k (omplement of k) of set S, suh tht k. (k) = 0 Aove, we do not mention the numer of elements in S nd how the opertors mnipulte the elements of S. In ft, one n formulte mny Boolen Algers depending on the numer of elements in S nd opertor definitions. Among ll Boolen Algers, the two-vlued (two-element) Boolen Alger lso known s Swithing Alger is the most widely known nd studied. Swithing Alger ws developed y C. Shnnon in 938. Swithing Alger is defined y Algeri system (0 ; ; + ;. ; -) nd The six postultes ove under the ondition tht The following opertor rules on (0 ; ) pply, i.e. opertor definitions : k m k.m 0 0 0 0 0 0 0 Definition of the AND opertor k m k+m 0 0 0 0 0 Definition of the OR opertor k k 0 0 Definition of the Complement (NOT, Invert) opertor Tody, eh opertor ove is diretly implemented y eletroni iruits. Tht is, digitl eletroni iruit with trnsistors, pitors, resistors nd other eletroni omponents is designed to perform the speifi logi opertion. Eh suh digitl eletroni iruit for n opertor is lled gte. The figure elow shows the shemti symols of gtes for the three opertors ove. NYU Shool of Engineering Pge 9 of 24 CS224 Hndout No : 3 Jnury 27, 204

m Preedene rules : In order to redue the numer of prentheses, we hve the following set of preedene rules tht indites whih suk 2-input AND Gte k. m m k 2-input OR Gte k + m k k NOT Gte AND nd OR gtes n hve ny numer of inputs, s long s there re t lest two inputs. However, NOT gte lwys hs single input. 3-input AND Gte m p k k. m. p 3-input OR Gte m p k k + m + p There re other opertors, suh s NAND, NOR, EXOR nd EXNOR, tht re implemented y gtes. Tody, on hip eletroni iruits with trnsistors implement severl gtes to hundreds of millions of gtes. There re hips, suh s miroproessor hips, with illions of trnsistors. Chip densities hve inresed t the rte of Moore s Lw sine 960s : The numer of trnsistors on hip doules every two yers. Sine 938, theorems hve een developed in Swithing Alger. These theorems follow nd stisfy the postultes nd opertor definitions given ove : TI) Idempoteny : ) k + k = k ) k. k = k TII) Null elements ) k + = ) k. 0 = 0 TIII) Asorption : ) k + (k. m) = k ) k. (k + m) = k TIV) Involution : ((k)) = k TV) Assoitivity : ) k + (m + p) = (k + m) + p = k + m + p ) k. (m. p) = (k. m). p = k. m. p TVI) ) k + ((k).m) = k + m ) k. ((k) + m) = k. m TVII) DeMorgn s theorems : ) (k + m) = k. m ) (k. m) = k + m TVIII) Consensus theorem : ) (k. m) + ((k). p) + (m. p) = (k. m) + ((k). p) ) (k + m). ((k) + p). (m + p) = (k + m). ((k) + p) The dulity priniple : A postulte or theorem n e otined from nother postulte or theorem y interhnging the inry opertors + nd. nd the identity elements 0 nd. The AND opertor symol : In order to redue the numer of symols in expressions, we will not show the. symol etween vriles. We will imply there is n AND opertion etween them :.. = NYU Shool of Engineering Pge 0 of 24 CS224 Hndout No : 3 Jnury 27, 204

expression or opertor to evlute next : Evlute the expression inside pir of prentheses Evlute NOT Evlute AND Evlute OR Exmple : ().(((.()) + (.)) + ((.).( + ())) = ( + ) + ( + ) The truth tle for omintionl iruit shows the output vlue for every input omintion. For exmple, for the 4-input imginry iruit elow, the following truth tle reltes the output to the inputs. The truth tle desries the funtion, the input/output reltionship : Truth Tle : A B C D f(a, B, C, D) A B C D Digitl Ciruit f(a,b,c,d) 0 2 3 4 5 6 7 8 9 0 2 3 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Trditionlly, the OR opertion is lled sum nd the AND opertion is lled produt sine the OR performs similr to the sum opertion nd the AND performs similr to multiply opertion. One n develop n expression tht fouses on the s of the output suh tht it would hve s mny terms s there re s on the output. Eh term is n input omintion tht genertes for the output. Suh n expression would hve produt terms summed nd is lled the nonil SOP expression. Eh produt term is nonil produt term nd hs ll the inputs of the funtions. For exmple, on the ove truth tle, there re seven s, therefore the nonil SOP expression would hve seven nonil produt terms. The nonil SOP expression of the ove truth tle is the following : f(a, B, C, D) = A B C D + A B C D + A B C D + A B C D + A B C D + A B C D + A B C D The expression implements the funtion euse when nonil produt genertes, the whole expression eomes : OR ny term is. All we hve to do is to show tht eh nonil produt term orresponds to n input omintion. Here, we show the orrespondene etween the first nonil produt term nd the top input omintion tht genertes on the truth tle. The first term whih is A B C D genertes, if A, B, C nd D re ll s. In order for this to hppen, A, B nd C must zero nd D must e so tht 0 AND 0 AND 0 AND is AND AND AND whih is. We see tht if n input is omplemented, we need to hve the input s 0, otherwise to determine the input omintion. Then, the input omintion tht orrespondents to A B C D is sine it is 000 on the truth tle. The other nonil produt terms re for the remining six input omintions 3, 5, 7, 9, 3 nd 5 : A B C D 0 0 0 A B C D 0 0 3 A B C D 0 0 5 A B C D 0 7 A B C D 0 0 9 A B C D 0 3 A B C D 5 Another funtion representtion is the minterm list tht ontins the minterms of the funtion. A minterm is n input omintion tht genertes for the output. Sine nonil produt term lso hs the sme property, we n stte NYU Shool of Engineering Pge of 24 CS224 Hndout No : 3 Jnury 27, 204

tht minterm is nonil produt term. The minterm list is diretly otined from the truth tle. For the ove iruit, the minterm list is s follows : The minterm list : f(a, B, C, D) = m(,3,5,7,9,3,5) If we hve nonil expression or ny expression for funtion, we n use Swithing Alger to simplify it. One n lso Krnugh mps to otin miniml expressions. Below, we give the simplifition of omplex expression tht desries funtion : f(a, B, C, D) = D(AB + C) + ABCD + A B D A nonminiml expression for funtion f = ABD + CD + AD(B + BC) k(m + s) = km + ks = ABD + CD + AD(B + C) k + km = k + m = ABD + CD + A B D + ACD k(m + s) = km + ks = ABD + A B D + D(C + CA) k(m + s) = km + ks = ABD + A B D + CD + AD k + km = k + m & k(m + s) = km + ks = ABD + CD + AD( + B) k(m + s) = km + ks = ABD + CD + AD k + = & k = k = CD + D(A + AB) k(m + s) = km + ks = CD + AD + BD k + km = k + m & k(m + s) = km + ks expressions for funtion f Miniml SOP expression One we hve the miniml SOP expression, we drw the omintionl iruit (the gte network) s the lst step. The resulting gte network is wht we ll the 2-level AND-OR gte network. Below, we show the miniml 2-level AND-OR gte network for the ove funtion : 2-level miniml AND-OR gte network D D D C A B f(a, B, C, D) =AD + BD + CD Note tht the ove iruit hs three levels : level of inverters, level of AND gtes nd level of the OR gte. However,, these gte networks re still lled 2-level AND-OR gte networks nd we will keep tht nme. The reson why we try to otin SOP expressions is tht they re implemented y 2 (3)-level gte networks tht re the fstest possile we n hve s explined elow. Note tht -level gte network hs only one gte whih nnot e useful for rel-life pplitions. In summry, funtion, n input/output reltionship, for omintionl iruit n e expressed in different formts, tht inlude the truth tle, the minterm list, the miniml sum-of-produts (SOP) expression, the nonil SOP expression nd the Krnugh mp. Among these representtions, the miniml SOP expression nd the miniml POS expressions re the designs of the iruits. This is euse, they diretly desrie iruits (gte networks) nd s the word miniml implies, the iruits re miniml. Why SOP nd POS expressions re worked on is tht they led to 2 (3)-level gte networks tht re the fstest omintionl iruits. Often the SOP expression is preferred when iruits re designed sine it looks like n ordinry lger expression. For the ske of revity, we will not disuss POS expressions here. Students re referred to ooks on digitl logi. 2.8.2.2. Krnugh Mp Simplifitions Cnonil SOP nd nonil POS expressions re lmost ll the time not miniml. We need to otin miniml NYU Shool of Engineering Pge 2 of 24 CS224 Hndout No : 3 Jnury 27, 204

expressions from them. One n use Swithing Alger t the expense of slow nd error prone simplifition proess or the Krnugh mp tehnique. To use the Krnugh mp tehnique (to otin the miniml SOP expression) we strt with the minterm list. One n use the Krnugh mp tehnique to otin miniml POS expressions s well. In this one strts with the mxterm list. We will not disuss POS expression simplifitions vi Krnugh mps. There is stndrd K mp for ertin numer of inputs. In this setion, we will hve K mps for four-input ses. The K-mp method strts with pling the minterms on the mp. Then, one omines logill/vertilly djent 2 k minterms nd ontins produt term. If there re n inputs nd k minterms re omined, the produt term must hve (n - k) vriles. This n e used to verify tht the omintion is legl. Note tht minterms long the edges re djent. One hs to over ll the minterms of the funtion to get the miniml SOP expression. Therefore, the gol is to over the minterms y using minimum numer of lrgest omintions. The miniml SOP expression for the ove funtions is s otined elow : A 3 C D f(a, B, C, D) = AD + BD + CD 2 3 2 B The orresponding 2-level AND-OR gte is s follows : d d d f(,,, d) 2-level miniml AND-OR gte network Another exmple to otin the miniml SOP expression of funtion y using the Krnugh-mp method : 2 f(,,,d) = m025780345 4 Either iruit is miniml. We hoose the one with omintions, 2 nd 3 : d 3 3 f(,,, d) = d + d + or 2 d 3 The funtion hs two miniml SOP expressions NYU Shool of Engineering Pge 3 of 24 CS224 Hndout No : 3 Jnury 27, 204

d f(,,, d) d 2-level miniml AND-OR network 2.8.2.3. Comintionl Ciruit Speed Gtes in omintionl iruits output vlues sed on the inputs. If n input is hnged, the output hnges fter short dely whih is lled propgtion (gte) dely, t p. This dely is funtion of eletil properties of the gte, inluding the proess of the gte. We desrie the proess in the Trnsistor Lyer setion. Tody, this dely is few nnoseonds or less. Note tht when we mention the speed of gte, we men its gte dely : the shorter the dely, the fster the gte is. y The gte dely from the input to the output : t p y t p The gte dely is one of the three ftors tht determine the speed of omintion iruit. The other two ftors re the longest pth from n input to the output, i.e. the numer of gte levels nd wire delys etwen the gtes. 2-level gte networks hve the shortest pth from n input to the output nd so they re the fstest iruits. Wire delys lso ontriute to the speed sine it tkes some time for signls to trvel from one gte to nother. Although, 2-level gte networks re stisftory for high speed, they result in expensive iruits. 2.8.2.4. Exmples of Comintionl Ciruit Design Below, we give exmples of designing simple omintionl iruits, lgeri simplifitions nd Krnugh mp simplifition so tht students understnd the purpose of the postultes nd theorems. A (-it) 2-to- Multiplexer A -it 2-to- Multiplexer (MUX) is seletor whih selets one of the two inputs sed on selet signl. As seen elow, it hs three inputs nd one output. Two inputs ( nd ) re dt inputs one of whih is output. The third input () is the ontrol input, the selet input. The single output is lwys equl to either or t ny time. The MUX is -it MUX sine when n input is seleted, there is only one dt line seleted. As the gte network shows, the MUX hs three gte delys. One n develop 2-it 2-to- MUXes, 4-it 2-to- MUXes, et, y using numer of -it 2-to- MUXes s desried in the next setion. Note lso tht there re k-it 4-to- MUXes, k-it 8-to- MUXes, et. NYU Shool of Engineering Pge 4 of 24 CS224 Hndout No : 3 Jnury 27, 204

-it 2-to- MUX If = 0 then y = = then y = y(,, ) 0 2 3 4 5 6 7 y(,, ) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 y(,, ) = m(2,3, 5, 7) y(,, ) = ( + ) + ( + ) = + k + k = = + k = k 2 3 5 7 0 0 0 0 + + + k(m+p) = km + kp y A 4-it 2-to- Multiplexer A 4-it 2-to- MUX hs two sets of dt inputs. Eh set of dt inputs hs four its. Thus, the MUX hs four outputs rrying the vlues of the four input lines seleted. The MUX hs 9 inputs nd 4 outputs. The single input is nd the 4-it inputs re K nd M. The 4-it otput is Y. It outputs K if is 0 nd outputs M if is. The lk ox view nd implementtion of the 4-it 2-to- MUX is given elow. K M 4 (K3,K2,K,K0) 4 (M3,M2,M,M0) 4-it 2-to- MUX 4 Y (Y3,Y2,Y,Y0) Sine there re 9 inputs, we need to prtition it into simpler piees! We hve to otin the opertion tle of the 4-it 2-to- MUX : Opertion 0 Y = K Y = M The mjor opertions re not ler on this opertion tle. We need to get different, more detiled opertion tle : Opertion 0 Y3 = K3 ; Y2 = K2 ; Y = K ; Y0 = K0 Y3 = M3 ; Y2 = M2 ; Y = M ; Y0 = M0 If = 0 then Y = K else if = then Y = M There re four identil mjor opertions : -it 2-to- MUXing! We prtition the 4-it 2-to- MUX into four loks. Eh lok is -it 2-to- MUX whih we hve designed y using Swithing Alger : It hs three inputs nd one output : K3 M3 K2 M2 K M K0 M0 -it 2-to- MUX -it 2-to- MUX -it 2-to- MUX -it 2-to- MUX Y3 Y2 Y Y0 The 4-it 2-to- MUX is then s follows : NYU Shool of Engineering Pge 5 of 24 CS224 Hndout No : 3 Jnury 27, 204

K3 M3 K2 M2 Y3 Y2 K M K0 M0 Y Y0 A -it Adder, Full Adder A -it dder, Full ADDer (FA) dds two -it numers plus rry input. Therefore, it dds three its. It hs 3 inputs nd 2 outputs s shown elow. F A out (,, ) sum(,, ) out We otin the truth tle from whih we otin the nonil SOP expressions : The -it ADDer : + sum 0 2 3 4 5 6 7 out (,, ) sum(,, ) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 out (,,) = m(3, 5, 6, 7) sum(,,) = m(, 2, 4, 7) 3 5 6 7 0 0 0 + + + 2 4 7 0 0 0 0 0 0 + + + The nonil sum(,, ) expression is lso the miniml expression. It nnot e simplified : sum(,, ) out (,,, d) = + + + = + + + = ( + ) + + k(m+p) = km +kp = + + k+k = & k = k = ( + ) + k(m+p) = km +kp = ( + ) + k + km = k + m = + + k(m+p) = km +kp = ( + ) + k(m+p) = km +kp = ( + ) + k + km = k + m = + + k(m+p) = km +kp The 2-level AND-OR gte networks re s follows : NYU Shool of Engineering Pge 6 of 24 CS224 Hndout No : 3 Jnury 27, 204

3 gte delys sum(,, ) = + + + 2 gte delys out (,, ) = + + Therefore, Full Adder tkes 3 gte delys to generte the sum output (sum(,, )) nd two gte delys to generte the rry out ( out (,, )). A 32-it Ripple-Crry Adder An importnt omponent of digitl system is the ALU whih hs n dder, multiplier, AND, OR nd other funtionl units. The dder is the most ritil one sine its speed prtly determines the lok frequeny of the digitl system. CPUs hve typilly 32-it dder whih hs to omplete its opertion in one or few lok periods. Thus, high-speed dder hs to e designed. out + K M R K, M nd R 32-it 2 s Complement Binry numers in out K 32 32-it Adder 32 32 M in + out 32 K3 K30 K29 K28... K2 K K0 M3 M30 M29 M28... M2 M M0 R3 R30 R29 R28... R2 R R0 in 0 R A 32-it dder dds two 32-it numers plus rry input. It hs 65 inputs nd 33 outputs. Our strting point to design high-speed dder is 32-it Ripple-Crry Adder whih is the slowest tht n e designed. The Ripple- Crry Adder hs 32 -it dders, known s Full Adders : K3 M3 K30 M30 K29 M29 K2 M2 K M K0 M0 out 32 FA 3 FA FA... FA FA 30 3 2 FA in 0 R3 R30 R29 R2 R R0 Eh one of our 32 Full Adders hve the ove two gte networks. By using these two gte delys, we n otin the worst se ddition time for our 32-it Ripple-Crry Adder : NYU Shool of Engineering Pge 7 of 24 CS224 Hndout No : 3 Jnury 27, 204

K3 M3 K30 M30 K29 M29 K2 M2 K M K0 M0 64 out 3 FA 62 3 FA 60 6 4 FA... FA FA 30 3 2 2 0 FA in 0 R3 65 R30 63 R29 6 R2 7 R 5 R0 3 Our 32-it Ripple-Crry Adder tkes 65 gte delys to lulte the sum. If gte dely is ns, the ddition time is 65ns. This is very long for tody s stndrds. We need to improve the timing. We will do tht y designing 32-it Crry-Lookhed Adder in lss. A 2-to-4 Deoder The most ommon deoder is the inry deoder whih hs k dt inputs nd 2 k outputs. If thedeoder is 2-to-4 deoder, then k is 2 nd so there re 2 2 = 4 outputs. The k inputs represent n unsigned inry numer. The outputs deode the unsigned numer represented y the k inputs.for exmple if the inputs represent (3) 0, Output line 3 is nd the other outputs lines re 0. Below the development of the 2-to-4 deoder is shown. I0 I 2-to-4 Deoder Inputs (I, I0) represent n unsigned inry numer Y0 Y Y2 Y3 I I0 Y3 Y2 Y Y0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Y0 = I I0 Y = I I0 Y2 = I I0 Y3 = I I0 I0 I I0 I I I I I I0 I0 I0 I0 Y0 Y Y2 Y3 The deoder outputs require t most two gte levels to generte the outputs. Therefore, the deoder is fst. Note tht there re 3-to-8, 4-to-6, et. deoders whose opertion nd implementtion follow similrly. We will use 4-to-6 deoder when we implement hrdwiring lter in the semester. Tody s memory hips (DRAM, SRAM, ROM, et.) hve lrge inry deoders. For k-input deoder, there re 2 k AND gtes. Eh input is onneted to hlf the numer of AND gtes. For smll size deoders, this is not mjor prolem. But, for lrge deoders, it is prolem whih is lled fn-out. The fn-out of line is numer whih indites how mny inputs n e onneted to it. If the numer is exeeded, eletrilly, there re prolems nd so the iruit my not work. It is euse of this reson tht the deoders of memory hips hve their gte networks with more thn two levels to redue the fn-out requirement. However, with more levels, the deoder is slower, therefore, the memory hip is slower. 2.8.3. Sequentil Ciruits A sequentil iruit onsists of flip-flops nd gtes. It hs flip-flops to store its, mening pst inputs. Therefore, sequentil iruit output depends on the present inputs nd lso pst inputs. This mens sequentil iruits hve the time dimension. A flip-flop opertes different from gte suh tht it stores it, if it reeives n edge on the lok signl. The edge is either high-to-low trnsition of the signl (negtive edge) or the low-to-high trnsition of the signl (positive edge). A flip uses only one type of these two edges. For exmple, if flip-flop stores when it reeives negtive edge, then we sy it is negtive-edge triggered. There re severl types of flip-flops. One tht is used to implement registers is the Dt (D) flip-flop. Another frequently used flip-flop is the J-K flipflop used to implement ounters. A D flip-flop hs single dt input whih is NYU Shool of Engineering Pge 8 of 24 CS224 Hndout No : 3 Jnury 27, 204

stored when the lok edge is reeived. A J-K flip-flop hs two inputs nd is stored it when lok edge is reeived. The timing of the edge is ontrolled y Store ontrol signl generted y the ontrol unit. A flip-flop tody hs two outputs : One unomplemented, Q, nd the other one omplemented, Q. 2.8.3.. D Flip-Flops The flip-flop hs two outputs : Q, nd Q s shown elow. The CE or lok enle input enles/disles the lok input, the C input. Tht is, if CE is 0, the lok input nnot e used. The lok or C input indites when to store on the flip-flop. The tringle symol next to the lok input indites it is n edge triggered lok input. The high-tolow trnsition symol indites the flip-flop is stored when there is negtive edge. Wht is stored on the flip-flop depends on the dt input, D. Aording to the opertion tle elow, when CE is nd there is negtive edge on the lok input nd the D input is 0, we store 0 fter the negtive edge, typilly, few nno seonds fter the negtive edge. If the D input is t the edge, we store fter the edge. Finlly, the lst two rows, indite when the D input is ignored. Tht is, when the flip-flop is not stored. When CE is 0 or when there is no negtive edge, the D input is ignored. This is lso known s Don t Cre nd is shown y n X symol. D CE C Q Q D CE C Opertion 0 Store 0 fter the negtive edge Store fter the negtive edge X 0 X Not Stored X 0 Not Stored 2.8.3.2. Registers A register is sequentil iruit used to store dt temporrily. It is stored dt y pplying lok edge t the end of the lok period it needs to e stored. Note the register whih is stored vlue in prtiulr lok period tully gets the vlue in the eginning of the following lok period. The exmple elow shows n imginry 32-it register nmed A whih is stored vlue when its Store A signl is in lok periods 2 nd 5. The D flip-flops of the register reeive the edge t the end of lok periods 2 nd 5. OBUS 32 32 D Q Store A Clok CE C A lok period lok period 2 lok period 3 lok period 4 lok period 5 lok period 6 Clok Store A negtive edge negtive edge OBUS A We study how the D flip-flop ove n e used to store its y using one of the its of OBUS shown for the register exmple ove. The rightmost flip-flop A[0] nd how it is stored re shown elow. Note tht we do not store every NYU Shool of Engineering Pge 9 of 24 CS224 Hndout No : 3 Jnury 27, 204

lok period, ut when it is neessry y rising CE to in prtiulr lok period. Clok Result Store A D CE C Q Q A[0] lok period lok period 2 lok period 3 lok period 4 lok period 5 lok period 6 Clok Store A negtive edge negtive edge OBUS[0] A[0] The D input is used only when there is negtive edge on the lok input. Therefore, D line hnges do not ffet the output ll the time whih is unlike the gte opertion. Note tht on the timing digrm, it looks like output A[0] hnges t the sme time the negtive edge ours. Atully, it hnges few nno seonds fter the negtive edge. Similrly, if the input seems to hnge t the sme time there is negtive edge, the vlue stored is the vlue right efore the negtive edge. For exmple, if OBUS[0] hnged from to 0 t the end of lok period 2, the vlue tht is stored is the vlue right efore the edge whih is. 2.9. Trnsistor Lyer : This lyer onsist of digitl eletroni iruits. Digitl eletroni iruits re used to uild digitl iruits. Tht is, digitl eletroni iruits implement gtes (lso flip-flops). Digitl eletroni iruits onsist of trnsistors, resistors, pitors, diodes, et. Trnsistors re the min omponent nd so this level is often lled the trnsistor level. Trnsistors in these iruits re used s on-off swithes. The swithes re turned on nd off y ontrol inputs. The figure elow shows on-off swithes nd how these swithes re used to implement n AND gte s n exmple. A swith is devie with two onditions : 0 0 Open when the ontrol input is 0 Closed when the ontrol input is m k AND k.m k m k.m AND gte NYU Shool of Engineering Pge 20 of 24 CS224 Hndout No : 3 Jnury 27, 204

The figure elow gives n exmple of gte implementtion where the gte is 2-input TTL NAND gte. The implementtion of the NAND gte y trnsistors, resistors, et. is shown next to the gte. TTL is hip tehnology nd is desried elow. Tody, digitl eletroni iruits, i.e. iruits with trnsistors, resistors, pitors, et., re on hips. Tht is, gtes nd flip-flops re on hips. A TTL NAND gte implementtion y On Semiondutor B A NAND A.B NAND gte Trnsistor We use semiondutor sustnes to implement trnsistors. Tht is, tody s hips re semiondutor hips. Silion nd Gllium Arsenide re exmples of semiondutor sustnes. Eh sustne hs its own speed, ost, power onsumption figures. The most ommon sustne is Silion whih is found in se snd. This is why Silion hip pries re so low. Chip design is onstrined y design gols : speed, ost, power onsumption, size, weight, reliility, et. Before the design is strted, we determine these onstrints nd then design the produt. We try not to exeed the onstrints, y using the right numer of gtes nd flip-flops nd right digitl eletroni implementtions. However, it is not esy to stisfy them s they onflit with eh other. For exmple, the higher the speed, the higher the ost nd power onsumption. Hene, study of spetrum of hoies from semiondutor sustnes to hip densities is needed. The figure elow shows the spetrum of sustnes nd their reltive speed for tody s digitl eletroni iruits. Comprison of ommonly used sustnes nd digitl eletroni iruits for hips with respet to hip density Silion Silion Germnium Gllium (SiGe) Arsenide Nioium (Superonduting) (Not semiondutor) Sustne used Unipolr Bipolr Trnsistor type CMOS BiCMOS TTL ECL Trnsistor iruit SSI MSI LSI VLSI ULSI LSI VLSI ULSI SSI MSI LSI SSI MSI LSI SSI MSI LSI fster Numer of gtes on the hip Unipolr/ipolr trnsistors nd other eletroni omponents (resistors, diodes,...) re used to implement trnsistor iruits, suh s CMOS, TTL, ECL nd BiCMOS. By using trnsistor iruit, we implement single gte. For exmple, CMOS AND gte, TTL AND gte, et. The reson for using resistors, diodes, et. is first for the orret opertion of trnsistors nd seond to mintin the signl integrity, hene opertionl stility of the gte. The numer of eletroni omponents on hip depends on the intended funtionlity : the more funtionlity, the more omponents. A widely used lssifition of integrtion of omponents on hips is given on Tle elow. The erliest hips from the 960s were SSI hips nd some of them re still used tody. The urrent stte of the rt NYU Shool of Engineering Pge 2 of 24 CS224 Hndout No : 3 Jnury 27, 204

miroproessors hve more up to 5 illion trnsistors. The integrtion level for these high-density hips is eyond ULSI ut no new nme is greed upon it yet. Tle : Chip densities for vrious sles of integrtion Sles of Integrtion (hip density) Siewiorek et l (982) Burger et l (982) Smll Sle Integrtion (SSI) < 0 gtes < 64 omponents Medium Sle Integrtion (MSI) < 00 gtes < 2K omponents Lrge Sle Integrtion (LSI) < 0,000 gtes < 64K omponents Very Lrge Sle Integrtion (VLSI) < 00,000 gtes < 2M omponents Ultr Lrge Sle Integrtion (ULSI) > 00,000 gtes > 2M omponents Silion is the most ommonly used sustne nd used y high-speed miroproessors nd high-density memory hips. Silion is expeted to e round next 0 to 5 yers. Tle 2 elow presents the stte of the silion tehnology. Silion trnsistor iruits (CMOS, TTL,...) hve different speed, ost, power onsumption figures. A rief desription of TTL nd CMOS iruits is given elow. TTL iruits re used for high-speed, low-ost pplitions while CMOS is for high-density hips, suh s miroproessors nd memories (DRAM, SRAM). CMOS iruits re lso used for portle pplitions tht require low-power onsumption (spe, emedded pplitions). Tle 3 ompres three most ommonly used trnsistor iruits. CMOS is the preferred trnsistor iruit to implement miroproessors nd high-density memory hips. This is euse CMOS iruits onsume the lest mount of power mong the three. TTL is the most widely ville nd hepest one, while ECL is the fstest one. It must e noted tht eletrostti dishrge n dmge CMOS hips. Unless properly grounded, one should not touh CMOS hips. Tle 2: The stte of the silion tehnology Chrteristi Densest hip trnsistor iruit CMOS Silion Trnsistors/hip (density) 4,000,000,000 Gte dely Proess 50-500 ps 22 nnometer Tle 3: Summry of hrteristis for three ommonly used IC logi fmilies Prmeter TTL CMOS ECL Speed Medium Low High Power onsumption Medium Low High Chip density Medium High Low Cost Low Medium High 3. The Big Piture : Trnsistors to Computers Tody, digitl eletroni iruits (trnsistor iruits) re on hips. Tht is, those trnsistors, resistors, pitors, et. NYU Shool of Engineering Pge 22 of 24 CS224 Hndout No : 3 Jnury 27, 204

re on hips. Chips re on printed iruit ords (PCBs) lso known s rds. A PCB n ontin tens of hips. The min PCB of omputer is lled motherord whih ontins the miroproessor nd the memory hips. Typilly, how mny PCBs non-emedded omputer n hve in single inet depends on the size of the PCB together with the power nd ooling rrngements of the inet nd the room the inet is in. For exmple, desktop omputer n hve two to six PCBs. Trnsistors nd eletroni omponents re pled in the enter of the hip the re lled die. Tht is, the digitl iruits implemented y trnsistors re on the die. Pins (terminls) of the hip llow the omponents on the die to e essile from the externl world. The die is onneted to the pins y mens of wires. Die re pled on wfer. The numer of die per wfer depends on sizes of the wfer nd die. The size of the die depends on the omplexity (funtionlity) of the digitl iruit! A hip A PCB Die Trnsistors re on the die The UC Berkeley VIRAM die VIRAM silion die re on the silion wfer. The wfer ontins 72 VIRAM die Photos y Joseph Geis : The Berkeley Intelligent RAM (IRAM) Projet : http://irm.s.erkeley.edu Just s hip design is onstrined y the speed, ost, power onsumption, size, weight, reliility, et., the PCB design is lso onstrined y the sme ftors. Before the PCB design is strted, we determine these onstrints! Bsed on them, we go hed nd design the PCB. We keep speed, ost, power onsumption, size, weight, et. of the PCB in mind, y using the right numer of hips, hip implementtions nd wiring. The hierrhy of iruits from hips to the whole system is exemplified y one of the fstest superomputers, the IBM Blue Gene/L, elow. IBM Deep Computing Permission to use y IBM NYU Shool of Engineering Pge 23 of 24 CS224 Hndout No : 3 Jnury 27, 204

A miroproessor hip ontins severl proessors (ores, entrl proessing units, CPUs), he memories, memory mngement units (MMUs) nd us interfes. These re implemented y registers, uses, rithmeti-logi units (ALUs), sequeners nd other digitl iruits. All of these digitl iruits re implemented y gtes nd flip-flops. Finlly, ll the gtes nd flip-flops re implemented y trnsistor iruits whih re on single die. Therefore, trnsistor iruits on die implement miroproessor. Below the Intel Pentium 4 die is shown. Intel Pentium 4 Proessor Die on 0.8-miron : http://www.intel.om Computer Orgniztion nd Design The Hrdwre/Softwre Interfe, Dvid A. Ptterson nd John L. Hennessy, 3rd edition, Morgn Kufmn, 2005, pp. 2. We pk hundreds of millions of trnsistors on hip tody. The numer follows Moore s Lw : Every two yers the numer of trnsistors on hip doules. Beuse we shrink the size of the trnsistor. Wht we ll the proess on Tle 2 ove is mesure to determine the size of trnsistor on hip. The proess is 22 nnometer tody nd is redued y one-third every two yers, shrinking trnsistor size. Currently, we hve hips with more thn one illion trnsistors. A typil MIPS-sed miroproessor orgniztion, the MIPS R0000 die is shown elow. The generl omputer orgniztion tht is lso implemented on the MIPS : L Ins Che MMU L2 Che CPU L Dt Che MMU2 MIPS Bus Interfe One or more uses (System us,..) One or more uses (I/O uses) Disk DMA I/O Controller(s) Printer Terminl virtul memory nd file storge Memory Controller(s) DMA2 Disk Physil Memory The MIPS R0000 die photo http://wr.ees.erkeley.edu/cic/die_photos/r0k.gif Power onsumption is mjor onern tody s there re so mny trnsistors on the hip. Power is lso relted to the lok frequeny : the higher the lok frequeny, the higher the power onsumption. When the power onsumption is high, the temperture of the hip inreses. If hot hip is not ooled quikly, it will urn out. Thus, one hs to use het sinks, fns or liquids to ool the hip. However, ooling dds to the size, weight nd ost of the hip nd the PCB. The reent shift in miroproessor design from one proessor (ore) to multiple proessors (ores) on the hip is due to the inresed power onsumption. Simply put, engineers nnot keep the miroproessor hip t low tempertures with simple ooling tehniques when they inresed the lok frequeny. They hve to lower the lok frequeny. But, tht inreses the exeution time (CPUtime), mening slower speeds. The solution to keep the exeution time low is y using multiple proessors. All the proessors exeute instrutions of the sme pplition, performing more opertions per lok period, ompensting for the redued lok rte. Note tht multi-ore miroproessor is not uniproessor. It is prllel proessing system! A multi-ore hip requires new CPUtime eqution. It nnot use the one given in the textook. Wht n it e? NYU Shool of Engineering Pge 24 of 24 CS224 Hndout No : 3 Jnury 27, 204