ALU Architecture and ISA Extensions

Size: px
Start display at page:

Download "ALU Architecture and ISA Extensions"

Transcription

1 ALU Architecture and ISA Extensions Lecture notes from MKP, H. H. Lee and S. Yalamanchili Reading Sections (only those elements covered in class) Sections Appendix B.5 Practice Problems: 26, 27 Goal: Understand the v ISA view of the core microarchitecture v Organization of functional units and register files into basic data paths (2) 1

2 Overview Instruction Set Architectures have a purpose v Applications dictate what we need We only have a fixed number of bits v Impact on accuracy More is not better v We cannot afford everything we want Basic Arithmetic Logic Unit (ALU) Design v Addition/subtraction, multiplication, division (3) Reminder: ISA Register File (Programmer Visible State) 0x00 0x01 0x02 0x03 Memory Interface byte addressed memory stack 0x1F Processor Internal Buses Dynamic Data Data segment (static) Text Segment Program Counter Programmer Invisible State 0xFFFFFFFF Reserved Instruction register Arithmetic Logic Unit (ALU) Memory Map Kernel registers Who sees what? (4) 2

3 Arithmetic for Computers Operations on integers v Addition and subtraction v Multiplication and division v Dealing with overflow Operation on floating-point real numbers v Representation and operations Let us first look at integers (5) Integer Addition(3.2) Example: n Overflow if result out of range n Adding +ve and ve operands, no overflow n n Adding two +ve operands n Overflow if result sign is 1 Adding two ve operands n Overflow if result sign is 0 (6) 3

4 Add negation of second operand Example: 7 6 = 7 + ( 6) Integer Subtraction +7: : : s complement representation Overflow if result out of range v Subtracting two +ve or two ve operands, no overflow v Subtracting +ve from ve operand o Overflow if result sign is 0 v Subtracting ve from +ve operand o Overflow if result sign is 1 (7) ISA Impact Some languages (e.g., C) ignore overflow v Use MIPS addu, addui, subu instructions Other languages (e.g., Ada, Fortran) require raising an exception v Use MIPS add, addi, sub instructions v On overflow, invoke exception handler o o o Save PC in exception program counter (EPC) register Jump to predefined handler address mfc0 (move from coprocessor register) instruction can retrieve EPC value, to return after corrective action (more later) ALU Design leads to many solutions. We look at one simple example (8) 4

5 Integer ALU (arithmetic logic unit)(b.5) Build a 1 bit ALU, and use 32 of them (bit-slice) operation op a b res a b result (9) Single Bit ALU Implements only AND and OR operations Operation 0 Result A B 1 (10) 5

6 Adding Functionality We can add additional operators (to a point) How about addition? a c out = ab + ac in + bc in sum = a b c in Sum b CarryOut Review full adders from digital design (11) Building a 32-bit ALU Operation Operation a0 b0 ALU0 CarryOut Result0 a 0 1 Result a1 b1 ALU1 CarryOut Result1 b 2 a2 b2 ALU2 CarryOut Result2 CarryOut a31 b31 ALU31 Result31 (12) 6

7 Two's complement approach: just negate b and add 1. sub How do we negate? Subtraction (a b)? B i n v e r t C a r y I n O p e r a t i o n b 0 a 0 C a r y I n A L U 0 C a r y O u t R e s u l t 0 A clever solution: Binvert Operation b 1 a 1 C a r y I n A L U 1 C a r y O u t R e s u l t 1 a 0 1 Result b 2 a 2 C a r r y I n A L U 2 C a r r y O u t R e s u l t 2 b CarryOut b 3 1 a 3 1 C a r r y I n A L U 3 1 R e s u l t 3 1 (13) Tailoring the ALU to the MIPS Need to support the set-on-less-than instruction(slt) v remember: slt is an arithmetic instruction v produces a 1 if rs < rt and 0 otherwise v use subtraction: (a-b) < 0 implies a < b Need to support test for equality (beq $t5, $t6, $t7) v use subtraction: (a-b) = 0 implies a = b (14) 7

8 What Result31 is when (a-b)<0? Binvert Operation a0 b0 ALU0 Less CarryOut Result0 B i n v e r t O p e r a t i o n C a r y I n a1 b1 0 ALU1 Less CarryOut Result1 a 0 1 a2 b2 0 ALU2 Less CarryOut Result2 b L e s s 3 R e s u l t C a r r y O u t a31 b31 0 ALU31 Less Result31 Set Overflow Unsigned vs. signed support (15) Test for equality Notice control lines: 000 = and 001 = or 010 = add 110 = subtract 111 = slt Bnegate a0 b0 a1 b1 0 ALU0 Less CarryOut ALU1 Less CarryOut Operation Result0 Result1 Zero Note: zero is a 1 when the result is zero! a2 b2 0 ALU2 Less CarryOut Result2 Note test for overflow! a31 b31 0 ALU31 Less Result31 Set Overflow (16) 8

9 ISA View CPU/Core $0 $1 $31 ALU Register-to-Register data path We want this to be as fast as possible (17) Multiplication (3.3) Long multiplication multiplicand multiplier product Length of product is the sum of operand lengths (18) 9

10 A Multiplier Uses multiple adders v Cost/performance tradeoff n Can be pipelined n Several multiplication performed in parallel (19) Two 32-bit registers for product v HI: most-significant 32 bits v LO: least-significant 32-bits Instructions v mult rs, rt / multu rs, rt MIPS Multiplication o 64-bit product in HI/LO v mfhi rd / mflo rd o Move from HI/LO to rd o Can test HI value to see if product overflows 32 bits v mul rd, rs, rt o Least-significant 32 bits of product > rd Study Exercise: Check out signed and unsigned multiplication with QtSPIM (20) 10

11 quotient dividend divisor remainder 10 n-bit operands yield n-bit quotient and remainder Division(3.4) Check for 0 divisor Long division approach v If divisor dividend bits o 1 bit in quotient, subtract v Otherwise o 0 bit in quotient, bring down next dividend bit Restoring division v Do the subtract, and if remainder goes < 0, add divisor back Signed division v Divide using absolute values v Adjust sign of quotient and remainder as required (21) Faster Division Can t use parallel hardware as in multiplier v Subtraction is conditional on sign of remainder Faster dividers (e.g. SRT division) generate multiple quotient bits per step v Still require multiple steps Customized implementations for high performance, e.g., supercomputers (22) 11

12 MIPS Division Use HI/LO registers for result v HI: 32-bit remainder v LO: 32-bit quotient Instructions v div rs, rt / divu rs, rt v No overflow or divide-by-0 checking o Software must perform checks if required v Use mfhi, mflo to access result Study Exercise: Check out signed and unsigned division with QtSPIM (23) ISA View CPU/Core $0 $1 $31 ALU Multiply Divide Hi Lo Additional function units and registers (Hi/Lo) Additional instructions to move data to/from these registers v mfhi, mflo What other instructions would you add? Cost? (24) 12

13 Floating Point(3.5) Representation for non-integral numbers v Including very small and very large numbers Like scientific notation v v v In binary normalized v ±1.xxxxxxx 2 2 yyyy Types float and double in C not normalized (25) IEEE 754 Floating-point Representation Single Precision (32-bit) S exponent significand 1bit 8 bits 23 bits ( 1) sign x (1+fraction) x 2 exponent-127 Double Precision (64-bit) S exponent significand 1bit 11 bits 20 bits significand (continued) 32 bits ( 1) sign x (1+fraction) x 2 exponent-1023 (26) 13

14 Defined by IEEE Std Floating Point Standard Developed in response to divergence of representations v Portability issues for scientific code Now almost universally adopted Two representations v Single precision (32-bit) v Double precision (64-bit) (27) FP Adder Hardware Much more complex than integer adder Doing it in one clock cycle would take too long v Much longer than integer operations v Slower clock would penalize all instructions FP adder usually takes several cycles v Can be pipelined Example: FP Addition (28) 14

15 FP Adder Hardware Step 1 Step 2 Step 3 Step 4 (29) FP Arithmetic Hardware FP multiplier is of similar complexity to FP adder v But uses a multiplier for significands instead of an adder FP arithmetic hardware usually does v Addition, subtraction, multiplication, division, reciprocal, square-root v FP integer conversion Operations usually takes several cycles v Can be pipelined (30) 15

16 FP hardware is coprocessor 1 v Adjunct processor that extends the ISA Separate FP registers v 32 single-precision: $f0, $f1, $f31 ISA Impact v Paired for double-precision: $f0/$f1, $f2/$f3, o Release 2 of MIPs ISA supports bit FP reg s FP instructions operate only on FP registers v Programs generally do not perform integer ops on FP data, or vice versa v More registers with minimal code-size impact (31) ISA View: The Co-Processor CPU/Core Co-Processor 1 $0 $1 $0 $1 $31 $31 ALU Multiply Divide FP ALU Hi Lo Co-Processor 0 BadVaddr Status Causes EPC later Floating point operations access a separate set of 32-bit registers v Pairs of 32-bit registers are used for double precision (32) 16

17 ISA View Distinct instructions operate on the floating point registers (pg. A-73) v Arithmetic instructions o add.d fd, fs, ft, and add.s fd, fs, ft double precision single precision Data movement to/from floating point coprocessors v mcf1 rt, fs and mtc1 rd, fs Note that the ISA design implementation is extensible via co-processors FP load and store instructions v lwc1, ldc1, swc1, sdc1 o e.g., ldc1 $f8, 32($sp) Example: DP Mean (33) Associativity Floating point arithmetic is not commutative Parallel programs may interleave operations in unexpected orders v Assumptions of associativity may fail (x+y)+z x -1.50E+38 y 1.50E E+00 z E+00 x+(y+z) -1.50E E E+00 n Need to validate parallel programs under varying degrees of parallelism (34) 17

18 Performance Issues Latency of instructions v Integer instructions can take a single cycle v Floating point instructions can take multiple cycles v Some (FP Divide) can take hundreds of cycles What about energy (we will get to that shortly) What other instructions would you like in hardware? v Would some applications change your mind? How do you decide whether to add new instructions? (35) Characterizing Parallelism Today serial computing cores (von Neumann model) Data Streams Instruction Streams SISD MISD SIMD MIMD Single instruction multiple data stream computing, e.g., SSE Today s Multicore Characterization due to M. Flynn* *M. Flynn, (September 1972). "Some Computer Organizations and Their Effectiveness". IEEE Transactions on Computers, C 21 (9): t (36) 18

19 Parallelism Categories From (37) Multimedia (3.6, 3.7, 3.8) Lower dynamic range and precision requirements v Do not need 32-bits! Inherent parallelism in the operations (38) 19

20 Vector Computation Operate on multiple data elements (vectors) at a time Flexible definition/use of registers Registers hold integers, floats (SP), doubles DP) 128-bit Register 1x128 bit integer 2x64-bit double precision 4 x 32-bit single precision 8x16 short integers (39) Processing Vectors When is this more efficient? Memory vector registers When is this not efficient? Think of 3D graphics, linear algebra and media processing (40) 20

21 Case Study: Intel Streaming SIMD Extensions 8, 128-bit XMM registers v X86-64 adds 8 more registers XMM8-XMM15 8, 16, 32, 64 bit integers (SSE2) 32-bit (SP) and 64-bit (DP) floating point Signed/unsigned integer operations IEEE 754 floating point support Reading Assignment: v v (41) Instruction Categories Floating point instructions v Arithmetic, movement v Comparison, shuffling v Type conversion, bit level memory register register Integer Other v e.g., cache management ISA extensions! Advanced Vector Extensions (AVX) v Successor to SSE (42) 21

22 Arithmetic View Graphics and media processing operates on vectors of 8-bit and 16-bit data v Use 64-bit adder, with partitioned carry chain o Operate on 8 8-bit, 4 16-bit, or 2 32-bit vectors v SIMD (single-instruction, multiple-data) Saturating operations v On overflow, result is largest representable value o c.f. 2s-complement modulo arithmetic v E.g., clipping in audio, saturation in video 4x16-bit 2x32-bit (43) // A 16byte = 128bit vector struct struct Vector4 { float x, y, z, w; }; // Add two constant vectors and return the resulting vector Vector4 SSE_Add ( const Vector4 &Op_A, const Vector4 &Op_B ) { Vector4 Ret_Vector; SSE Example More complex example (matrix multiply) in Section 3.8 using AVX } asm { MOV EAX Op_A MOV EBX, Op_B MOVUPS XMM0, [EAX] MOVUPS XMM1, [EBX] ADDPS XMM0, XMM1 MOVUPS [Ret_Vector], XMM0 } return Ret_Vector; // Load pointers into CPU regs // Move unaligned vectors to SSE regs // Add vector elements // Save the return vector From (44) 22

23 Intel Xeon Phi (45) Data Parallel vs. Traditional Vector Vector Architecture Vector Register A Vector Register C Vector Register B pipelined functional unit Data Parallel Architecture registers Process each square in parallel data parallel computation (46) 23

24 ISA View CPU/Core $0 $1 SIMD Registers XMM0 XMM1 $31 XMM15 ALU Multiply Divide Vector ALU Hi Lo Separate core data path Can be viewed as a co-processor with a distinct set of instructions (47) Domain Impact on the ISA: Example Scientific Computing Floats Double precision Massive data Power constrained Embedded Systems Integers Lower precision Streaming data Security support Energy constrained (48) 24

25 Summary ISAs support operations required of application domains v Note the differences between embedded and supercomputers! v Signed, unsigned, FP, SIMD, etc. Bounded precision effects v Software must be careful how hardware used e.g., associativity v Need standards to promote portability Avoid kitchen sink designs v There is no free lunch v Impact on speed and energy à we will get to this later (49) Study Guide Perform 2 s complement addition and subtraction (review) Add a few more instructions to the simple ALU v Add an XOR instruction v Add an instruction that returns the max of its inputs v Make sure all control signals are accounted for Convert real numbers to single precision floating point (review) and extract the value from an encoded single precision number (review) Execute the SPIM programs (class website) that use floating point numbers. Study the memory/ register contents via single step execution (50) 25

26 Study Guide (cont.) Write a few simple SPIM programs for v Multiplication/division of signed and unsigned numbers o Use numbers that produce >32-bit results o Move to/from HI and LO registers ( find the instructions for doing so) v Addition/subtraction of floating point numbers Try to write a simple SPIM program that demonstrates that floating point operations are not associative (this takes some thought and review of the range of floating point numbers) Look up additional SIMD instruction sets and compare v AMD NEON, Altivec, AMD 3D Now (51) Glossary Co-processor Data parallelism Data parallel computation vs. vector computation Instruction set extensions Overflow MIMD Precision SIMD Saturating arithmetic Signed arithmetic support Unsigned arithmetic support Vector processing (52) 26

Divide: Paper & Pencil. Computer Architecture ALU Design : Division and Floating Point. Divide algorithm. DIVIDE HARDWARE Version 1

Divide: Paper & Pencil. Computer Architecture ALU Design : Division and Floating Point. Divide algorithm. DIVIDE HARDWARE Version 1 Divide: Paper & Pencil Computer Architecture ALU Design : Division and Floating Point 1001 Quotient Divisor 1000 1001010 Dividend 1000 10 101 1010 1000 10 (or Modulo result) See how big a number can be

More information

Lecture 8: Binary Multiplication & Division

Lecture 8: Binary Multiplication & Division Lecture 8: Binary Multiplication & Division Today s topics: Addition/Subtraction Multiplication Division Reminder: get started early on assignment 3 1 2 s Complement Signed Numbers two = 0 ten 0001 two

More information

Arithmetic in MIPS. Objectives. Instruction. Integer arithmetic. After completing this lab you will:

Arithmetic in MIPS. Objectives. Instruction. Integer arithmetic. After completing this lab you will: 6 Objectives After completing this lab you will: know how to do integer arithmetic in MIPS know how to do floating point arithmetic in MIPS know about conversion from integer to floating point and from

More information

Binary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit

Binary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit Decimal Division Remember 4th grade long division? 43 // quotient 12 521 // divisor dividend -480 41-36 5 // remainder Shift divisor left (multiply by 10) until MSB lines up with dividend s Repeat until

More information

LSN 2 Computer Processors

LSN 2 Computer Processors LSN 2 Computer Processors Department of Engineering Technology LSN 2 Computer Processors Microprocessors Design Instruction set Processor organization Processor performance Bandwidth Clock speed LSN 2

More information

Reduced Instruction Set Computer (RISC)

Reduced Instruction Set Computer (RISC) Reduced Instruction Set Computer (RISC) Focuses on reducing the number and complexity of instructions of the ISA. RISC Goals RISC: Simplify ISA Simplify CPU Design Better CPU Performance Motivated by simplifying

More information

FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015

FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015 FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015 AGENDA The Kaveri Accelerated Processing Unit (APU) The Graphics Core Next Architecture and its Floating-Point Arithmetic

More information

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc

A single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc Other architectures Example. Accumulator-based machines A single register, called the accumulator, stores the operand before the operation, and stores the result after the operation. Load x # into acc

More information

This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers

This Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers This Unit: Floating Point Arithmetic CIS 371 Computer Organization and Design Unit 7: Floating Point App App App System software Mem CPU I/O Formats Precision and range IEEE 754 standard Operations Addition

More information

Assembly Language Programming

Assembly Language Programming Assembly Language Programming Assemblers were the first programs to assist in programming. The idea of the assembler is simple: represent each computer instruction with an acronym (group of letters). Eg:

More information

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture

More information

Typy danych. Data types: Literals:

Typy danych. Data types: Literals: Lab 10 MIPS32 Typy danych Data types: Instructions are all 32 bits byte(8 bits), halfword (2 bytes), word (4 bytes) a character requires 1 byte of storage an integer requires 1 word (4 bytes) of storage

More information

DNA Data and Program Representation. Alexandre David 1.2.05 adavid@cs.aau.dk

DNA Data and Program Representation. Alexandre David 1.2.05 adavid@cs.aau.dk DNA Data and Program Representation Alexandre David 1.2.05 adavid@cs.aau.dk Introduction Very important to understand how data is represented. operations limits precision Digital logic built on 2-valued

More information

ECE 0142 Computer Organization. Lecture 3 Floating Point Representations

ECE 0142 Computer Organization. Lecture 3 Floating Point Representations ECE 0142 Computer Organization Lecture 3 Floating Point Representations 1 Floating-point arithmetic We often incur floating-point programming. Floating point greatly simplifies working with large (e.g.,

More information

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu. Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.tw Review Computers in mid 50 s Hardware was expensive

More information

Instruction Set Architecture

Instruction Set Architecture Instruction Set Architecture Consider x := y+z. (x, y, z are memory variables) 1-address instructions 2-address instructions LOAD y (r :=y) ADD y,z (y := y+z) ADD z (r:=r+z) MOVE x,y (x := y) STORE x (x:=r)

More information

Intel 8086 architecture

Intel 8086 architecture Intel 8086 architecture Today we ll take a look at Intel s 8086, which is one of the oldest and yet most prevalent processor architectures around. We ll make many comparisons between the MIPS and 8086

More information

Let s put together a Manual Processor

Let s put together a Manual Processor Lecture 14 Let s put together a Manual Processor Hardware Lecture 14 Slide 1 The processor Inside every computer there is at least one processor which can take an instruction, some operands and produce

More information

To convert an arbitrary power of 2 into its English equivalent, remember the rules of exponential arithmetic:

To convert an arbitrary power of 2 into its English equivalent, remember the rules of exponential arithmetic: Binary Numbers In computer science we deal almost exclusively with binary numbers. it will be very helpful to memorize some binary constants and their decimal and English equivalents. By English equivalents

More information

Exceptions in MIPS. know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine

Exceptions in MIPS. know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine 7 Objectives After completing this lab you will: know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine Introduction Branches and jumps provide ways to change

More information

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches: Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):

More information

Instruction Set Design

Instruction Set Design Instruction Set Design Instruction Set Architecture: to what purpose? ISA provides the level of abstraction between the software and the hardware One of the most important abstraction in CS It s narrow,

More information

Intel 64 and IA-32 Architectures Software Developer s Manual

Intel 64 and IA-32 Architectures Software Developer s Manual Intel 64 and IA-32 Architectures Software Developer s Manual Volume 1: Basic Architecture NOTE: The Intel 64 and IA-32 Architectures Software Developer's Manual consists of seven volumes: Basic Architecture,

More information

CS201: Architecture and Assembly Language

CS201: Architecture and Assembly Language CS201: Architecture and Assembly Language Lecture Three Brendan Burns CS201: Lecture Three p.1/27 Arithmetic for computers Previously we saw how we could represent unsigned numbers in binary and how binary

More information

Software implementation of Post-Quantum Cryptography

Software implementation of Post-Quantum Cryptography Software implementation of Post-Quantum Cryptography Peter Schwabe Radboud University Nijmegen, The Netherlands October 20, 2013 ASCrypto 2013, Florianópolis, Brazil Part I Optimizing cryptographic software

More information

CHAPTER 7: The CPU and Memory

CHAPTER 7: The CPU and Memory CHAPTER 7: The CPU and Memory The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides

More information

Computer organization

Computer organization Computer organization Computer design an application of digital logic design procedures Computer = processing unit + memory system Processing unit = control + datapath Control = finite state machine inputs

More information

Chapter 5 Instructor's Manual

Chapter 5 Instructor's Manual The Essentials of Computer Organization and Architecture Linda Null and Julia Lobur Jones and Bartlett Publishers, 2003 Chapter 5 Instructor's Manual Chapter Objectives Chapter 5, A Closer Look at Instruction

More information

Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language

Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language Chapter 4 Register Transfer and Microoperations Section 4.1 Register Transfer Language Digital systems are composed of modules that are constructed from digital components, such as registers, decoders,

More information

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit

More information

EE361: Digital Computer Organization Course Syllabus

EE361: Digital Computer Organization Course Syllabus EE361: Digital Computer Organization Course Syllabus Dr. Mohammad H. Awedh Spring 2014 Course Objectives Simply, a computer is a set of components (Processor, Memory and Storage, Input/Output Devices)

More information

Introduction to MIPS Assembly Programming

Introduction to MIPS Assembly Programming 1 / 26 Introduction to MIPS Assembly Programming January 23 25, 2013 2 / 26 Outline Overview of assembly programming MARS tutorial MIPS assembly syntax Role of pseudocode Some simple instructions Integer

More information

More on Pipelining and Pipelines in Real Machines CS 333 Fall 2006 Main Ideas Data Hazards RAW WAR WAW More pipeline stall reduction techniques Branch prediction» static» dynamic bimodal branch prediction

More information

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the

More information

MICROPROCESSOR AND MICROCOMPUTER BASICS

MICROPROCESSOR AND MICROCOMPUTER BASICS Introduction MICROPROCESSOR AND MICROCOMPUTER BASICS At present there are many types and sizes of computers available. These computers are designed and constructed based on digital and Integrated Circuit

More information

Computer Architecture TDTS10

Computer Architecture TDTS10 why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers

More information

Data Storage 3.1. Foundations of Computer Science Cengage Learning

Data Storage 3.1. Foundations of Computer Science Cengage Learning 3 Data Storage 3.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: List five different data types used in a computer. Describe how

More information

Adaptive Stable Additive Methods for Linear Algebraic Calculations

Adaptive Stable Additive Methods for Linear Algebraic Calculations Adaptive Stable Additive Methods for Linear Algebraic Calculations József Smidla, Péter Tar, István Maros University of Pannonia Veszprém, Hungary 4 th of July 204. / 2 József Smidla, Péter Tar, István

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit. Objectives The Central Processing Unit: What Goes on Inside the Computer Chapter 4 Identify the components of the central processing unit and how they work together and interact with memory Describe how

More information

CPU Organization and Assembly Language

CPU Organization and Assembly Language COS 140 Foundations of Computer Science School of Computing and Information Science University of Maine October 2, 2015 Outline 1 2 3 4 5 6 7 8 Homework and announcements Reading: Chapter 12 Homework:

More information

İSTANBUL AYDIN UNIVERSITY

İSTANBUL AYDIN UNIVERSITY İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA AGENDA INTRO TO BEAGLEBONE BLACK HARDWARE & SPECS CORTEX-A8 ARMV7 PROCESSOR PROS & CONS VS RASPBERRY PI WHEN TO USE BEAGLEBONE BLACK Single

More information

MACHINE INSTRUCTIONS AND PROGRAMS

MACHINE INSTRUCTIONS AND PROGRAMS CHAPTER 2 MACHINE INSTRUCTIONS AND PROGRAMS CHAPTER OBJECTIVES In this chapter you will learn about: Machine instructions and program execution, including branching and subroutine call and return operations

More information

EC 362 Problem Set #2

EC 362 Problem Set #2 EC 362 Problem Set #2 1) Using Single Precision IEEE 754, what is FF28 0000? 2) Suppose the fraction enhanced of a processor is 40% and the speedup of the enhancement was tenfold. What is the overall speedup?

More information

Fast Arithmetic Coding (FastAC) Implementations

Fast Arithmetic Coding (FastAC) Implementations Fast Arithmetic Coding (FastAC) Implementations Amir Said 1 Introduction This document describes our fast implementations of arithmetic coding, which achieve optimal compression and higher throughput by

More information

Attention: This material is copyright 1995-1997 Chris Hecker. All rights reserved.

Attention: This material is copyright 1995-1997 Chris Hecker. All rights reserved. Attention: This material is copyright 1995-1997 Chris Hecker. All rights reserved. You have permission to read this article for your own education. You do not have permission to put it on your website

More information

The mathematics of RAID-6

The mathematics of RAID-6 The mathematics of RAID-6 H. Peter Anvin 1 December 2004 RAID-6 supports losing any two drives. The way this is done is by computing two syndromes, generally referred P and Q. 1 A quick

More information

18-447 Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013

18-447 Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013 18-447 Computer Architecture Lecture 3: ISA Tradeoffs Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013 Reminder: Homeworks for Next Two Weeks Homework 0 Due next Wednesday (Jan 23), right

More information

Introduction to GPU Programming Languages

Introduction to GPU Programming Languages CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

More information

Introducción. Diseño de sistemas digitales.1

Introducción. Diseño de sistemas digitales.1 Introducción Adapted from: Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg431 [Original from Computer Organization and Design, Patterson & Hennessy, 2005, UCB] Diseño de sistemas digitales.1

More information

(Refer Slide Time: 00:01:16 min)

(Refer Slide Time: 00:01:16 min) Digital Computer Organization Prof. P. K. Biswas Department of Electronic & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture No. # 04 CPU Design: Tirning & Control

More information

AN617. Fixed Point Routines FIXED POINT ARITHMETIC INTRODUCTION. Thi d t t d ith F M k 4 0 4. Design Consultant

AN617. Fixed Point Routines FIXED POINT ARITHMETIC INTRODUCTION. Thi d t t d ith F M k 4 0 4. Design Consultant Thi d t t d ith F M k 4 0 4 Fixed Point Routines AN617 Author: INTRODUCTION Frank J. Testa Design Consultant This application note presents an implementation of the following fixed point math routines

More information

How To Write A Hexadecimal Program

How To Write A Hexadecimal Program The mathematics of RAID-6 H. Peter Anvin First version 20 January 2004 Last updated 20 December 2011 RAID-6 supports losing any two drives. syndromes, generally referred P and Q. The way

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

Computer Organization and Components

Computer Organization and Components Computer Organization and Components IS5, fall 25 Lecture : Pipelined Processors ssociate Professor, KTH Royal Institute of Technology ssistant Research ngineer, University of California, Berkeley Slides

More information

Computer Science 281 Binary and Hexadecimal Review

Computer Science 281 Binary and Hexadecimal Review Computer Science 281 Binary and Hexadecimal Review 1 The Binary Number System Computers store everything, both instructions and data, by using many, many transistors, each of which can be in one of two

More information

Levent EREN levent.eren@ieu.edu.tr A-306 Office Phone:488-9882 INTRODUCTION TO DIGITAL LOGIC

Levent EREN levent.eren@ieu.edu.tr A-306 Office Phone:488-9882 INTRODUCTION TO DIGITAL LOGIC Levent EREN levent.eren@ieu.edu.tr A-306 Office Phone:488-9882 1 Number Systems Representation Positive radix, positional number systems A number with radix r is represented by a string of digits: A n

More information

Systems I: Computer Organization and Architecture

Systems I: Computer Organization and Architecture Systems I: Computer Organization and Architecture Lecture 2: Number Systems and Arithmetic Number Systems - Base The number system that we use is base : 734 = + 7 + 3 + 4 = x + 7x + 3x + 4x = x 3 + 7x

More information

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)

Bachelors of Computer Application Programming Principle & Algorithm (BCA-S102T) Unit- I Introduction to c Language: C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating

More information

High-speed image processing algorithms using MMX hardware

High-speed image processing algorithms using MMX hardware High-speed image processing algorithms using MMX hardware J. W. V. Miller and J. Wood The University of Michigan-Dearborn ABSTRACT Low-cost PC-based machine vision systems have become more common due to

More information

EECS 427 RISC PROCESSOR

EECS 427 RISC PROCESSOR RISC PROCESSOR ISA FOR EECS 427 PROCESSOR ImmHi/ ImmLo/ OP Code Rdest OP Code Ext Rsrc Mnemonic Operands 15-12 11-8 7-4 3-0 Notes (* is Baseline) ADD Rsrc, Rdest 0000 Rdest 0101 Rsrc * ADDI Imm, Rdest

More information

Review: MIPS Addressing Modes/Instruction Formats

Review: MIPS Addressing Modes/Instruction Formats Review: Addressing Modes Addressing mode Example Meaning Register Add R4,R3 R4 R4+R3 Immediate Add R4,#3 R4 R4+3 Displacement Add R4,1(R1) R4 R4+Mem[1+R1] Register indirect Add R4,(R1) R4 R4+Mem[R1] Indexed

More information

A SystemC Transaction Level Model for the MIPS R3000 Processor

A SystemC Transaction Level Model for the MIPS R3000 Processor SETIT 2007 4 th International Conference: Sciences of Electronic, Technologies of Information and Telecommunications March 25-29, 2007 TUNISIA A SystemC Transaction Level Model for the MIPS R3000 Processor

More information

Copy in your notebook: Add an example of each term with the symbols used in algebra 2 if there are any.

Copy in your notebook: Add an example of each term with the symbols used in algebra 2 if there are any. Algebra 2 - Chapter Prerequisites Vocabulary Copy in your notebook: Add an example of each term with the symbols used in algebra 2 if there are any. P1 p. 1 1. counting(natural) numbers - {1,2,3,4,...}

More information

IA-32 Intel Architecture Software Developer s Manual

IA-32 Intel Architecture Software Developer s Manual IA-32 Intel Architecture Software Developer s Manual Volume 1: Basic Architecture NOTE: The IA-32 Intel Architecture Software Developer s Manual consists of three volumes: Basic Architecture, Order Number

More information

THUMB Instruction Set

THUMB Instruction Set 5 THUMB Instruction Set This chapter describes the THUMB instruction set. Format Summary 5-2 Opcode Summary 5-3 5. Format : move shifted register 5-5 5.2 Format 2: add/subtract 5-7 5.3 Format 3: move/compare/add/subtract

More information

CHAPTER 4 MARIE: An Introduction to a Simple Computer

CHAPTER 4 MARIE: An Introduction to a Simple Computer CHAPTER 4 MARIE: An Introduction to a Simple Computer 4.1 Introduction 195 4.2 CPU Basics and Organization 195 4.2.1 The Registers 196 4.2.2 The ALU 197 4.2.3 The Control Unit 197 4.3 The Bus 197 4.4 Clocks

More information

Central Processing Unit (CPU)

Central Processing Unit (CPU) Central Processing Unit (CPU) CPU is the heart and brain It interprets and executes machine level instructions Controls data transfer from/to Main Memory (MM) and CPU Detects any errors In the following

More information

An Introduction to the ARM 7 Architecture

An Introduction to the ARM 7 Architecture An Introduction to the ARM 7 Architecture Trevor Martin CEng, MIEE Technical Director This article gives an overview of the ARM 7 architecture and a description of its major features for a developer new

More information

612 CHAPTER 11 PROCESSOR FAMILIES (Corrisponde al cap. 12 - Famiglie di processori) PROBLEMS

612 CHAPTER 11 PROCESSOR FAMILIES (Corrisponde al cap. 12 - Famiglie di processori) PROBLEMS 612 CHAPTER 11 PROCESSOR FAMILIES (Corrisponde al cap. 12 - Famiglie di processori) PROBLEMS 11.1 How is conditional execution of ARM instructions (see Part I of Chapter 3) related to predicated execution

More information

Pentium vs. Power PC Computer Architecture and PCI Bus Interface

Pentium vs. Power PC Computer Architecture and PCI Bus Interface Pentium vs. Power PC Computer Architecture and PCI Bus Interface CSE 3322 1 Pentium vs. Power PC Computer Architecture and PCI Bus Interface Nowadays, there are two major types of microprocessors in the

More information

2) Write in detail the issues in the design of code generator.

2) Write in detail the issues in the design of code generator. COMPUTER SCIENCE AND ENGINEERING VI SEM CSE Principles of Compiler Design Unit-IV Question and answers UNIT IV CODE GENERATION 9 Issues in the design of code generator The target machine Runtime Storage

More information

Chapter 5, The Instruction Set Architecture Level

Chapter 5, The Instruction Set Architecture Level Chapter 5, The Instruction Set Architecture Level 5.1 Overview Of The ISA Level 5.2 Data Types 5.3 Instruction Formats 5.4 Addressing 5.5 Instruction Types 5.6 Flow Of Control 5.7 A Detailed Example: The

More information

CS352H: Computer Systems Architecture

CS352H: Computer Systems Architecture CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline - Hazards October 1, 2009 University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell Data Hazards in ALU Instructions

More information

Performance evaluation

Performance evaluation Performance evaluation Arquitecturas Avanzadas de Computadores - 2547021 Departamento de Ingeniería Electrónica y de Telecomunicaciones Facultad de Ingeniería 2015-1 Bibliography and evaluation Bibliography

More information

Using the RDTSC Instruction for Performance Monitoring

Using the RDTSC Instruction for Performance Monitoring Using the Instruction for Performance Monitoring http://developer.intel.com/drg/pentiumii/appnotes/pm1.htm Using the Instruction for Performance Monitoring Information in this document is provided in connection

More information

CSI 333 Lecture 1 Number Systems

CSI 333 Lecture 1 Number Systems CSI 333 Lecture 1 Number Systems 1 1 / 23 Basics of Number Systems Ref: Appendix C of Deitel & Deitel. Weighted Positional Notation: 192 = 2 10 0 + 9 10 1 + 1 10 2 General: Digit sequence : d n 1 d n 2...

More information

EE 261 Introduction to Logic Circuits. Module #2 Number Systems

EE 261 Introduction to Logic Circuits. Module #2 Number Systems EE 261 Introduction to Logic Circuits Module #2 Number Systems Topics A. Number System Formation B. Base Conversions C. Binary Arithmetic D. Signed Numbers E. Signed Arithmetic F. Binary Codes Textbook

More information

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Chapter 2 Basic Structure of Computers Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Functional Units Basic Operational Concepts Bus Structures Software

More information

Computer Organization and Architecture

Computer Organization and Architecture Computer Organization and Architecture Chapter 11 Instruction Sets: Addressing Modes and Formats Instruction Set Design One goal of instruction set design is to minimize instruction length Another goal

More information

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design Learning Outcomes Simple CPU Operation and Buses Dr Eddie Edwards eddie.edwards@imperial.ac.uk At the end of this lecture you will Understand how a CPU might be put together Be able to name the basic components

More information

Binary Numbering Systems

Binary Numbering Systems Binary Numbering Systems April 1997, ver. 1 Application Note 83 Introduction Binary numbering systems are used in virtually all digital systems, including digital signal processing (DSP), networking, and

More information

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors Lesson 05: Array Processors Objective To learn how the array processes in multiple pipelines 2 Array Processor

More information

150127-Microprocessor & Assembly Language

150127-Microprocessor & Assembly Language Chapter 3 Z80 Microprocessor Architecture The Z 80 is one of the most talented 8 bit microprocessors, and many microprocessor-based systems are designed around the Z80. The Z80 microprocessor needs an

More information

1. Convert the following base 10 numbers into 8-bit 2 s complement notation 0, -1, -12

1. Convert the following base 10 numbers into 8-bit 2 s complement notation 0, -1, -12 C5 Solutions 1. Convert the following base 10 numbers into 8-bit 2 s complement notation 0, -1, -12 To Compute 0 0 = 00000000 To Compute 1 Step 1. Convert 1 to binary 00000001 Step 2. Flip the bits 11111110

More information

COMP 303 MIPS Processor Design Project 4: MIPS Processor Due Date: 11 December 2009 23:59

COMP 303 MIPS Processor Design Project 4: MIPS Processor Due Date: 11 December 2009 23:59 COMP 303 MIPS Processor Design Project 4: MIPS Processor Due Date: 11 December 2009 23:59 Overview: In the first projects for COMP 303, you will design and implement a subset of the MIPS32 architecture

More information

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2 Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

CPU Organisation and Operation

CPU Organisation and Operation CPU Organisation and Operation The Fetch-Execute Cycle The operation of the CPU 1 is usually described in terms of the Fetch-Execute cycle. 2 Fetch-Execute Cycle Fetch the Instruction Increment the Program

More information

The string of digits 101101 in the binary number system represents the quantity

The string of digits 101101 in the binary number system represents the quantity Data Representation Section 3.1 Data Types Registers contain either data or control information Control information is a bit or group of bits used to specify the sequence of command signals needed for

More information

64-Bit Architecture Speeds RSA By 4x

64-Bit Architecture Speeds RSA By 4x 64-Bit Architecture Speeds RSA By 4x MIPS Technologies, Inc. June 2002 Public-key cryptography, and RSA in particular, is increasingly important to e-commerce transactions. Many digital consumer appliances

More information

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer

Computers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer Computers CMPT 125: Lecture 1: Understanding the Computer Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University January 3, 2009 A computer performs 2 basic functions: 1.

More information

ELE 356 Computer Engineering II. Section 1 Foundations Class 6 Architecture

ELE 356 Computer Engineering II. Section 1 Foundations Class 6 Architecture ELE 356 Computer Engineering II Section 1 Foundations Class 6 Architecture History ENIAC Video 2 tj History Mechanical Devices Abacus 3 tj History Mechanical Devices The Antikythera Mechanism Oldest known

More information

Chapter 2 Logic Gates and Introduction to Computer Architecture

Chapter 2 Logic Gates and Introduction to Computer Architecture Chapter 2 Logic Gates and Introduction to Computer Architecture 2.1 Introduction The basic components of an Integrated Circuit (IC) is logic gates which made of transistors, in digital system there are

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of CS:APP Chapter 4 Computer Architecture Wrap-Up William J. Taffe Plymouth State University using the slides of Randal E. Bryant Carnegie Mellon University Overview Wrap-Up of PIPE Design Performance analysis

More information

Data Storage. Chapter 3. Objectives. 3-1 Data Types. Data Inside the Computer. After studying this chapter, students should be able to:

Data Storage. Chapter 3. Objectives. 3-1 Data Types. Data Inside the Computer. After studying this chapter, students should be able to: Chapter 3 Data Storage Objectives After studying this chapter, students should be able to: List five different data types used in a computer. Describe how integers are stored in a computer. Describe how

More information