ALU Architecture and ISA Extensions
|
|
- Hortense Taylor
- 7 years ago
- Views:
Transcription
1 ALU Architecture and ISA Extensions Lecture notes from MKP, H. H. Lee and S. Yalamanchili Reading Sections (only those elements covered in class) Sections Appendix B.5 Practice Problems: 26, 27 Goal: Understand the v ISA view of the core microarchitecture v Organization of functional units and register files into basic data paths (2) 1
2 Overview Instruction Set Architectures have a purpose v Applications dictate what we need We only have a fixed number of bits v Impact on accuracy More is not better v We cannot afford everything we want Basic Arithmetic Logic Unit (ALU) Design v Addition/subtraction, multiplication, division (3) Reminder: ISA Register File (Programmer Visible State) 0x00 0x01 0x02 0x03 Memory Interface byte addressed memory stack 0x1F Processor Internal Buses Dynamic Data Data segment (static) Text Segment Program Counter Programmer Invisible State 0xFFFFFFFF Reserved Instruction register Arithmetic Logic Unit (ALU) Memory Map Kernel registers Who sees what? (4) 2
3 Arithmetic for Computers Operations on integers v Addition and subtraction v Multiplication and division v Dealing with overflow Operation on floating-point real numbers v Representation and operations Let us first look at integers (5) Integer Addition(3.2) Example: n Overflow if result out of range n Adding +ve and ve operands, no overflow n n Adding two +ve operands n Overflow if result sign is 1 Adding two ve operands n Overflow if result sign is 0 (6) 3
4 Add negation of second operand Example: 7 6 = 7 + ( 6) Integer Subtraction +7: : : s complement representation Overflow if result out of range v Subtracting two +ve or two ve operands, no overflow v Subtracting +ve from ve operand o Overflow if result sign is 0 v Subtracting ve from +ve operand o Overflow if result sign is 1 (7) ISA Impact Some languages (e.g., C) ignore overflow v Use MIPS addu, addui, subu instructions Other languages (e.g., Ada, Fortran) require raising an exception v Use MIPS add, addi, sub instructions v On overflow, invoke exception handler o o o Save PC in exception program counter (EPC) register Jump to predefined handler address mfc0 (move from coprocessor register) instruction can retrieve EPC value, to return after corrective action (more later) ALU Design leads to many solutions. We look at one simple example (8) 4
5 Integer ALU (arithmetic logic unit)(b.5) Build a 1 bit ALU, and use 32 of them (bit-slice) operation op a b res a b result (9) Single Bit ALU Implements only AND and OR operations Operation 0 Result A B 1 (10) 5
6 Adding Functionality We can add additional operators (to a point) How about addition? a c out = ab + ac in + bc in sum = a b c in Sum b CarryOut Review full adders from digital design (11) Building a 32-bit ALU Operation Operation a0 b0 ALU0 CarryOut Result0 a 0 1 Result a1 b1 ALU1 CarryOut Result1 b 2 a2 b2 ALU2 CarryOut Result2 CarryOut a31 b31 ALU31 Result31 (12) 6
7 Two's complement approach: just negate b and add 1. sub How do we negate? Subtraction (a b)? B i n v e r t C a r y I n O p e r a t i o n b 0 a 0 C a r y I n A L U 0 C a r y O u t R e s u l t 0 A clever solution: Binvert Operation b 1 a 1 C a r y I n A L U 1 C a r y O u t R e s u l t 1 a 0 1 Result b 2 a 2 C a r r y I n A L U 2 C a r r y O u t R e s u l t 2 b CarryOut b 3 1 a 3 1 C a r r y I n A L U 3 1 R e s u l t 3 1 (13) Tailoring the ALU to the MIPS Need to support the set-on-less-than instruction(slt) v remember: slt is an arithmetic instruction v produces a 1 if rs < rt and 0 otherwise v use subtraction: (a-b) < 0 implies a < b Need to support test for equality (beq $t5, $t6, $t7) v use subtraction: (a-b) = 0 implies a = b (14) 7
8 What Result31 is when (a-b)<0? Binvert Operation a0 b0 ALU0 Less CarryOut Result0 B i n v e r t O p e r a t i o n C a r y I n a1 b1 0 ALU1 Less CarryOut Result1 a 0 1 a2 b2 0 ALU2 Less CarryOut Result2 b L e s s 3 R e s u l t C a r r y O u t a31 b31 0 ALU31 Less Result31 Set Overflow Unsigned vs. signed support (15) Test for equality Notice control lines: 000 = and 001 = or 010 = add 110 = subtract 111 = slt Bnegate a0 b0 a1 b1 0 ALU0 Less CarryOut ALU1 Less CarryOut Operation Result0 Result1 Zero Note: zero is a 1 when the result is zero! a2 b2 0 ALU2 Less CarryOut Result2 Note test for overflow! a31 b31 0 ALU31 Less Result31 Set Overflow (16) 8
9 ISA View CPU/Core $0 $1 $31 ALU Register-to-Register data path We want this to be as fast as possible (17) Multiplication (3.3) Long multiplication multiplicand multiplier product Length of product is the sum of operand lengths (18) 9
10 A Multiplier Uses multiple adders v Cost/performance tradeoff n Can be pipelined n Several multiplication performed in parallel (19) Two 32-bit registers for product v HI: most-significant 32 bits v LO: least-significant 32-bits Instructions v mult rs, rt / multu rs, rt MIPS Multiplication o 64-bit product in HI/LO v mfhi rd / mflo rd o Move from HI/LO to rd o Can test HI value to see if product overflows 32 bits v mul rd, rs, rt o Least-significant 32 bits of product > rd Study Exercise: Check out signed and unsigned multiplication with QtSPIM (20) 10
11 quotient dividend divisor remainder 10 n-bit operands yield n-bit quotient and remainder Division(3.4) Check for 0 divisor Long division approach v If divisor dividend bits o 1 bit in quotient, subtract v Otherwise o 0 bit in quotient, bring down next dividend bit Restoring division v Do the subtract, and if remainder goes < 0, add divisor back Signed division v Divide using absolute values v Adjust sign of quotient and remainder as required (21) Faster Division Can t use parallel hardware as in multiplier v Subtraction is conditional on sign of remainder Faster dividers (e.g. SRT division) generate multiple quotient bits per step v Still require multiple steps Customized implementations for high performance, e.g., supercomputers (22) 11
12 MIPS Division Use HI/LO registers for result v HI: 32-bit remainder v LO: 32-bit quotient Instructions v div rs, rt / divu rs, rt v No overflow or divide-by-0 checking o Software must perform checks if required v Use mfhi, mflo to access result Study Exercise: Check out signed and unsigned division with QtSPIM (23) ISA View CPU/Core $0 $1 $31 ALU Multiply Divide Hi Lo Additional function units and registers (Hi/Lo) Additional instructions to move data to/from these registers v mfhi, mflo What other instructions would you add? Cost? (24) 12
13 Floating Point(3.5) Representation for non-integral numbers v Including very small and very large numbers Like scientific notation v v v In binary normalized v ±1.xxxxxxx 2 2 yyyy Types float and double in C not normalized (25) IEEE 754 Floating-point Representation Single Precision (32-bit) S exponent significand 1bit 8 bits 23 bits ( 1) sign x (1+fraction) x 2 exponent-127 Double Precision (64-bit) S exponent significand 1bit 11 bits 20 bits significand (continued) 32 bits ( 1) sign x (1+fraction) x 2 exponent-1023 (26) 13
14 Defined by IEEE Std Floating Point Standard Developed in response to divergence of representations v Portability issues for scientific code Now almost universally adopted Two representations v Single precision (32-bit) v Double precision (64-bit) (27) FP Adder Hardware Much more complex than integer adder Doing it in one clock cycle would take too long v Much longer than integer operations v Slower clock would penalize all instructions FP adder usually takes several cycles v Can be pipelined Example: FP Addition (28) 14
15 FP Adder Hardware Step 1 Step 2 Step 3 Step 4 (29) FP Arithmetic Hardware FP multiplier is of similar complexity to FP adder v But uses a multiplier for significands instead of an adder FP arithmetic hardware usually does v Addition, subtraction, multiplication, division, reciprocal, square-root v FP integer conversion Operations usually takes several cycles v Can be pipelined (30) 15
16 FP hardware is coprocessor 1 v Adjunct processor that extends the ISA Separate FP registers v 32 single-precision: $f0, $f1, $f31 ISA Impact v Paired for double-precision: $f0/$f1, $f2/$f3, o Release 2 of MIPs ISA supports bit FP reg s FP instructions operate only on FP registers v Programs generally do not perform integer ops on FP data, or vice versa v More registers with minimal code-size impact (31) ISA View: The Co-Processor CPU/Core Co-Processor 1 $0 $1 $0 $1 $31 $31 ALU Multiply Divide FP ALU Hi Lo Co-Processor 0 BadVaddr Status Causes EPC later Floating point operations access a separate set of 32-bit registers v Pairs of 32-bit registers are used for double precision (32) 16
17 ISA View Distinct instructions operate on the floating point registers (pg. A-73) v Arithmetic instructions o add.d fd, fs, ft, and add.s fd, fs, ft double precision single precision Data movement to/from floating point coprocessors v mcf1 rt, fs and mtc1 rd, fs Note that the ISA design implementation is extensible via co-processors FP load and store instructions v lwc1, ldc1, swc1, sdc1 o e.g., ldc1 $f8, 32($sp) Example: DP Mean (33) Associativity Floating point arithmetic is not commutative Parallel programs may interleave operations in unexpected orders v Assumptions of associativity may fail (x+y)+z x -1.50E+38 y 1.50E E+00 z E+00 x+(y+z) -1.50E E E+00 n Need to validate parallel programs under varying degrees of parallelism (34) 17
18 Performance Issues Latency of instructions v Integer instructions can take a single cycle v Floating point instructions can take multiple cycles v Some (FP Divide) can take hundreds of cycles What about energy (we will get to that shortly) What other instructions would you like in hardware? v Would some applications change your mind? How do you decide whether to add new instructions? (35) Characterizing Parallelism Today serial computing cores (von Neumann model) Data Streams Instruction Streams SISD MISD SIMD MIMD Single instruction multiple data stream computing, e.g., SSE Today s Multicore Characterization due to M. Flynn* *M. Flynn, (September 1972). "Some Computer Organizations and Their Effectiveness". IEEE Transactions on Computers, C 21 (9): t (36) 18
19 Parallelism Categories From (37) Multimedia (3.6, 3.7, 3.8) Lower dynamic range and precision requirements v Do not need 32-bits! Inherent parallelism in the operations (38) 19
20 Vector Computation Operate on multiple data elements (vectors) at a time Flexible definition/use of registers Registers hold integers, floats (SP), doubles DP) 128-bit Register 1x128 bit integer 2x64-bit double precision 4 x 32-bit single precision 8x16 short integers (39) Processing Vectors When is this more efficient? Memory vector registers When is this not efficient? Think of 3D graphics, linear algebra and media processing (40) 20
21 Case Study: Intel Streaming SIMD Extensions 8, 128-bit XMM registers v X86-64 adds 8 more registers XMM8-XMM15 8, 16, 32, 64 bit integers (SSE2) 32-bit (SP) and 64-bit (DP) floating point Signed/unsigned integer operations IEEE 754 floating point support Reading Assignment: v v (41) Instruction Categories Floating point instructions v Arithmetic, movement v Comparison, shuffling v Type conversion, bit level memory register register Integer Other v e.g., cache management ISA extensions! Advanced Vector Extensions (AVX) v Successor to SSE (42) 21
22 Arithmetic View Graphics and media processing operates on vectors of 8-bit and 16-bit data v Use 64-bit adder, with partitioned carry chain o Operate on 8 8-bit, 4 16-bit, or 2 32-bit vectors v SIMD (single-instruction, multiple-data) Saturating operations v On overflow, result is largest representable value o c.f. 2s-complement modulo arithmetic v E.g., clipping in audio, saturation in video 4x16-bit 2x32-bit (43) // A 16byte = 128bit vector struct struct Vector4 { float x, y, z, w; }; // Add two constant vectors and return the resulting vector Vector4 SSE_Add ( const Vector4 &Op_A, const Vector4 &Op_B ) { Vector4 Ret_Vector; SSE Example More complex example (matrix multiply) in Section 3.8 using AVX } asm { MOV EAX Op_A MOV EBX, Op_B MOVUPS XMM0, [EAX] MOVUPS XMM1, [EBX] ADDPS XMM0, XMM1 MOVUPS [Ret_Vector], XMM0 } return Ret_Vector; // Load pointers into CPU regs // Move unaligned vectors to SSE regs // Add vector elements // Save the return vector From (44) 22
23 Intel Xeon Phi (45) Data Parallel vs. Traditional Vector Vector Architecture Vector Register A Vector Register C Vector Register B pipelined functional unit Data Parallel Architecture registers Process each square in parallel data parallel computation (46) 23
24 ISA View CPU/Core $0 $1 SIMD Registers XMM0 XMM1 $31 XMM15 ALU Multiply Divide Vector ALU Hi Lo Separate core data path Can be viewed as a co-processor with a distinct set of instructions (47) Domain Impact on the ISA: Example Scientific Computing Floats Double precision Massive data Power constrained Embedded Systems Integers Lower precision Streaming data Security support Energy constrained (48) 24
25 Summary ISAs support operations required of application domains v Note the differences between embedded and supercomputers! v Signed, unsigned, FP, SIMD, etc. Bounded precision effects v Software must be careful how hardware used e.g., associativity v Need standards to promote portability Avoid kitchen sink designs v There is no free lunch v Impact on speed and energy à we will get to this later (49) Study Guide Perform 2 s complement addition and subtraction (review) Add a few more instructions to the simple ALU v Add an XOR instruction v Add an instruction that returns the max of its inputs v Make sure all control signals are accounted for Convert real numbers to single precision floating point (review) and extract the value from an encoded single precision number (review) Execute the SPIM programs (class website) that use floating point numbers. Study the memory/ register contents via single step execution (50) 25
26 Study Guide (cont.) Write a few simple SPIM programs for v Multiplication/division of signed and unsigned numbers o Use numbers that produce >32-bit results o Move to/from HI and LO registers ( find the instructions for doing so) v Addition/subtraction of floating point numbers Try to write a simple SPIM program that demonstrates that floating point operations are not associative (this takes some thought and review of the range of floating point numbers) Look up additional SIMD instruction sets and compare v AMD NEON, Altivec, AMD 3D Now (51) Glossary Co-processor Data parallelism Data parallel computation vs. vector computation Instruction set extensions Overflow MIMD Precision SIMD Saturating arithmetic Signed arithmetic support Unsigned arithmetic support Vector processing (52) 26
Divide: Paper & Pencil. Computer Architecture ALU Design : Division and Floating Point. Divide algorithm. DIVIDE HARDWARE Version 1
Divide: Paper & Pencil Computer Architecture ALU Design : Division and Floating Point 1001 Quotient Divisor 1000 1001010 Dividend 1000 10 101 1010 1000 10 (or Modulo result) See how big a number can be
More informationLecture 8: Binary Multiplication & Division
Lecture 8: Binary Multiplication & Division Today s topics: Addition/Subtraction Multiplication Division Reminder: get started early on assignment 3 1 2 s Complement Signed Numbers two = 0 ten 0001 two
More informationArithmetic in MIPS. Objectives. Instruction. Integer arithmetic. After completing this lab you will:
6 Objectives After completing this lab you will: know how to do integer arithmetic in MIPS know how to do floating point arithmetic in MIPS know about conversion from integer to floating point and from
More informationBinary Division. Decimal Division. Hardware for Binary Division. Simple 16-bit Divider Circuit
Decimal Division Remember 4th grade long division? 43 // quotient 12 521 // divisor dividend -480 41-36 5 // remainder Shift divisor left (multiply by 10) until MSB lines up with dividend s Repeat until
More informationLSN 2 Computer Processors
LSN 2 Computer Processors Department of Engineering Technology LSN 2 Computer Processors Microprocessors Design Instruction set Processor organization Processor performance Bandwidth Clock speed LSN 2
More informationReduced Instruction Set Computer (RISC)
Reduced Instruction Set Computer (RISC) Focuses on reducing the number and complexity of instructions of the ISA. RISC Goals RISC: Simplify ISA Simplify CPU Design Better CPU Performance Motivated by simplifying
More informationFLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015
FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015 AGENDA The Kaveri Accelerated Processing Unit (APU) The Graphics Core Next Architecture and its Floating-Point Arithmetic
More informationA single register, called the accumulator, stores the. operand before the operation, and stores the result. Add y # add y from memory to the acc
Other architectures Example. Accumulator-based machines A single register, called the accumulator, stores the operand before the operation, and stores the result after the operation. Load x # into acc
More informationThis Unit: Floating Point Arithmetic. CIS 371 Computer Organization and Design. Readings. Floating Point (FP) Numbers
This Unit: Floating Point Arithmetic CIS 371 Computer Organization and Design Unit 7: Floating Point App App App System software Mem CPU I/O Formats Precision and range IEEE 754 standard Operations Addition
More informationAssembly Language Programming
Assembly Language Programming Assemblers were the first programs to assist in programming. The idea of the assembler is simple: represent each computer instruction with an acronym (group of letters). Eg:
More informationInstruction Set Architecture. or How to talk to computers if you aren t in Star Trek
Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture
More informationTypy danych. Data types: Literals:
Lab 10 MIPS32 Typy danych Data types: Instructions are all 32 bits byte(8 bits), halfword (2 bytes), word (4 bytes) a character requires 1 byte of storage an integer requires 1 word (4 bytes) of storage
More informationDNA Data and Program Representation. Alexandre David 1.2.05 adavid@cs.aau.dk
DNA Data and Program Representation Alexandre David 1.2.05 adavid@cs.aau.dk Introduction Very important to understand how data is represented. operations limits precision Digital logic built on 2-valued
More informationECE 0142 Computer Organization. Lecture 3 Floating Point Representations
ECE 0142 Computer Organization Lecture 3 Floating Point Representations 1 Floating-point arithmetic We often incur floating-point programming. Floating point greatly simplifies working with large (e.g.,
More informationComputer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.
Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.tw Review Computers in mid 50 s Hardware was expensive
More informationInstruction Set Architecture
Instruction Set Architecture Consider x := y+z. (x, y, z are memory variables) 1-address instructions 2-address instructions LOAD y (r :=y) ADD y,z (y := y+z) ADD z (r:=r+z) MOVE x,y (x := y) STORE x (x:=r)
More informationIntel 8086 architecture
Intel 8086 architecture Today we ll take a look at Intel s 8086, which is one of the oldest and yet most prevalent processor architectures around. We ll make many comparisons between the MIPS and 8086
More informationLet s put together a Manual Processor
Lecture 14 Let s put together a Manual Processor Hardware Lecture 14 Slide 1 The processor Inside every computer there is at least one processor which can take an instruction, some operands and produce
More informationTo convert an arbitrary power of 2 into its English equivalent, remember the rules of exponential arithmetic:
Binary Numbers In computer science we deal almost exclusively with binary numbers. it will be very helpful to memorize some binary constants and their decimal and English equivalents. By English equivalents
More informationExceptions in MIPS. know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine
7 Objectives After completing this lab you will: know the exception mechanism in MIPS be able to write a simple exception handler for a MIPS machine Introduction Branches and jumps provide ways to change
More informationSolution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:
Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):
More informationInstruction Set Design
Instruction Set Design Instruction Set Architecture: to what purpose? ISA provides the level of abstraction between the software and the hardware One of the most important abstraction in CS It s narrow,
More informationIntel 64 and IA-32 Architectures Software Developer s Manual
Intel 64 and IA-32 Architectures Software Developer s Manual Volume 1: Basic Architecture NOTE: The Intel 64 and IA-32 Architectures Software Developer's Manual consists of seven volumes: Basic Architecture,
More informationCS201: Architecture and Assembly Language
CS201: Architecture and Assembly Language Lecture Three Brendan Burns CS201: Lecture Three p.1/27 Arithmetic for computers Previously we saw how we could represent unsigned numbers in binary and how binary
More informationSoftware implementation of Post-Quantum Cryptography
Software implementation of Post-Quantum Cryptography Peter Schwabe Radboud University Nijmegen, The Netherlands October 20, 2013 ASCrypto 2013, Florianópolis, Brazil Part I Optimizing cryptographic software
More informationCHAPTER 7: The CPU and Memory
CHAPTER 7: The CPU and Memory The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides
More informationComputer organization
Computer organization Computer design an application of digital logic design procedures Computer = processing unit + memory system Processing unit = control + datapath Control = finite state machine inputs
More informationChapter 5 Instructor's Manual
The Essentials of Computer Organization and Architecture Linda Null and Julia Lobur Jones and Bartlett Publishers, 2003 Chapter 5 Instructor's Manual Chapter Objectives Chapter 5, A Closer Look at Instruction
More informationChapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language
Chapter 4 Register Transfer and Microoperations Section 4.1 Register Transfer Language Digital systems are composed of modules that are constructed from digital components, such as registers, decoders,
More informationADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit
More informationEE361: Digital Computer Organization Course Syllabus
EE361: Digital Computer Organization Course Syllabus Dr. Mohammad H. Awedh Spring 2014 Course Objectives Simply, a computer is a set of components (Processor, Memory and Storage, Input/Output Devices)
More informationIntroduction to MIPS Assembly Programming
1 / 26 Introduction to MIPS Assembly Programming January 23 25, 2013 2 / 26 Outline Overview of assembly programming MARS tutorial MIPS assembly syntax Role of pseudocode Some simple instructions Integer
More informationMore on Pipelining and Pipelines in Real Machines CS 333 Fall 2006 Main Ideas Data Hazards RAW WAR WAW More pipeline stall reduction techniques Branch prediction» static» dynamic bimodal branch prediction
More informationwhat operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?
Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the
More informationMICROPROCESSOR AND MICROCOMPUTER BASICS
Introduction MICROPROCESSOR AND MICROCOMPUTER BASICS At present there are many types and sizes of computers available. These computers are designed and constructed based on digital and Integrated Circuit
More informationComputer Architecture TDTS10
why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers
More informationData Storage 3.1. Foundations of Computer Science Cengage Learning
3 Data Storage 3.1 Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: List five different data types used in a computer. Describe how
More informationAdaptive Stable Additive Methods for Linear Algebraic Calculations
Adaptive Stable Additive Methods for Linear Algebraic Calculations József Smidla, Péter Tar, István Maros University of Pannonia Veszprém, Hungary 4 th of July 204. / 2 József Smidla, Péter Tar, István
More informationNext Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
More informationLogical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.
Objectives The Central Processing Unit: What Goes on Inside the Computer Chapter 4 Identify the components of the central processing unit and how they work together and interact with memory Describe how
More informationCPU Organization and Assembly Language
COS 140 Foundations of Computer Science School of Computing and Information Science University of Maine October 2, 2015 Outline 1 2 3 4 5 6 7 8 Homework and announcements Reading: Chapter 12 Homework:
More informationİSTANBUL AYDIN UNIVERSITY
İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER
More informationIntroduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
More informationBEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA
BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA AGENDA INTRO TO BEAGLEBONE BLACK HARDWARE & SPECS CORTEX-A8 ARMV7 PROCESSOR PROS & CONS VS RASPBERRY PI WHEN TO USE BEAGLEBONE BLACK Single
More informationMACHINE INSTRUCTIONS AND PROGRAMS
CHAPTER 2 MACHINE INSTRUCTIONS AND PROGRAMS CHAPTER OBJECTIVES In this chapter you will learn about: Machine instructions and program execution, including branching and subroutine call and return operations
More informationEC 362 Problem Set #2
EC 362 Problem Set #2 1) Using Single Precision IEEE 754, what is FF28 0000? 2) Suppose the fraction enhanced of a processor is 40% and the speedup of the enhancement was tenfold. What is the overall speedup?
More informationFast Arithmetic Coding (FastAC) Implementations
Fast Arithmetic Coding (FastAC) Implementations Amir Said 1 Introduction This document describes our fast implementations of arithmetic coding, which achieve optimal compression and higher throughput by
More informationAttention: This material is copyright 1995-1997 Chris Hecker. All rights reserved.
Attention: This material is copyright 1995-1997 Chris Hecker. All rights reserved. You have permission to read this article for your own education. You do not have permission to put it on your website
More informationThe mathematics of RAID-6
The mathematics of RAID-6 H. Peter Anvin 1 December 2004 RAID-6 supports losing any two drives. The way this is done is by computing two syndromes, generally referred P and Q. 1 A quick
More information18-447 Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013
18-447 Computer Architecture Lecture 3: ISA Tradeoffs Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013 Reminder: Homeworks for Next Two Weeks Homework 0 Due next Wednesday (Jan 23), right
More informationIntroduction to GPU Programming Languages
CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure
More informationIntroducción. Diseño de sistemas digitales.1
Introducción Adapted from: Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg431 [Original from Computer Organization and Design, Patterson & Hennessy, 2005, UCB] Diseño de sistemas digitales.1
More information(Refer Slide Time: 00:01:16 min)
Digital Computer Organization Prof. P. K. Biswas Department of Electronic & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Lecture No. # 04 CPU Design: Tirning & Control
More informationAN617. Fixed Point Routines FIXED POINT ARITHMETIC INTRODUCTION. Thi d t t d ith F M k 4 0 4. Design Consultant
Thi d t t d ith F M k 4 0 4 Fixed Point Routines AN617 Author: INTRODUCTION Frank J. Testa Design Consultant This application note presents an implementation of the following fixed point math routines
More informationHow To Write A Hexadecimal Program
The mathematics of RAID-6 H. Peter Anvin First version 20 January 2004 Last updated 20 December 2011 RAID-6 supports losing any two drives. syndromes, generally referred P and Q. The way
More informationChapter 1 Computer System Overview
Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides
More informationComputer Organization and Components
Computer Organization and Components IS5, fall 25 Lecture : Pipelined Processors ssociate Professor, KTH Royal Institute of Technology ssistant Research ngineer, University of California, Berkeley Slides
More informationComputer Science 281 Binary and Hexadecimal Review
Computer Science 281 Binary and Hexadecimal Review 1 The Binary Number System Computers store everything, both instructions and data, by using many, many transistors, each of which can be in one of two
More informationLevent EREN levent.eren@ieu.edu.tr A-306 Office Phone:488-9882 INTRODUCTION TO DIGITAL LOGIC
Levent EREN levent.eren@ieu.edu.tr A-306 Office Phone:488-9882 1 Number Systems Representation Positive radix, positional number systems A number with radix r is represented by a string of digits: A n
More informationSystems I: Computer Organization and Architecture
Systems I: Computer Organization and Architecture Lecture 2: Number Systems and Arithmetic Number Systems - Base The number system that we use is base : 734 = + 7 + 3 + 4 = x + 7x + 3x + 4x = x 3 + 7x
More informationBachelors of Computer Application Programming Principle & Algorithm (BCA-S102T)
Unit- I Introduction to c Language: C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating
More informationHigh-speed image processing algorithms using MMX hardware
High-speed image processing algorithms using MMX hardware J. W. V. Miller and J. Wood The University of Michigan-Dearborn ABSTRACT Low-cost PC-based machine vision systems have become more common due to
More informationEECS 427 RISC PROCESSOR
RISC PROCESSOR ISA FOR EECS 427 PROCESSOR ImmHi/ ImmLo/ OP Code Rdest OP Code Ext Rsrc Mnemonic Operands 15-12 11-8 7-4 3-0 Notes (* is Baseline) ADD Rsrc, Rdest 0000 Rdest 0101 Rsrc * ADDI Imm, Rdest
More informationReview: MIPS Addressing Modes/Instruction Formats
Review: Addressing Modes Addressing mode Example Meaning Register Add R4,R3 R4 R4+R3 Immediate Add R4,#3 R4 R4+3 Displacement Add R4,1(R1) R4 R4+Mem[1+R1] Register indirect Add R4,(R1) R4 R4+Mem[R1] Indexed
More informationA SystemC Transaction Level Model for the MIPS R3000 Processor
SETIT 2007 4 th International Conference: Sciences of Electronic, Technologies of Information and Telecommunications March 25-29, 2007 TUNISIA A SystemC Transaction Level Model for the MIPS R3000 Processor
More informationCopy in your notebook: Add an example of each term with the symbols used in algebra 2 if there are any.
Algebra 2 - Chapter Prerequisites Vocabulary Copy in your notebook: Add an example of each term with the symbols used in algebra 2 if there are any. P1 p. 1 1. counting(natural) numbers - {1,2,3,4,...}
More informationIA-32 Intel Architecture Software Developer s Manual
IA-32 Intel Architecture Software Developer s Manual Volume 1: Basic Architecture NOTE: The IA-32 Intel Architecture Software Developer s Manual consists of three volumes: Basic Architecture, Order Number
More informationTHUMB Instruction Set
5 THUMB Instruction Set This chapter describes the THUMB instruction set. Format Summary 5-2 Opcode Summary 5-3 5. Format : move shifted register 5-5 5.2 Format 2: add/subtract 5-7 5.3 Format 3: move/compare/add/subtract
More informationCHAPTER 4 MARIE: An Introduction to a Simple Computer
CHAPTER 4 MARIE: An Introduction to a Simple Computer 4.1 Introduction 195 4.2 CPU Basics and Organization 195 4.2.1 The Registers 196 4.2.2 The ALU 197 4.2.3 The Control Unit 197 4.3 The Bus 197 4.4 Clocks
More informationCentral Processing Unit (CPU)
Central Processing Unit (CPU) CPU is the heart and brain It interprets and executes machine level instructions Controls data transfer from/to Main Memory (MM) and CPU Detects any errors In the following
More informationAn Introduction to the ARM 7 Architecture
An Introduction to the ARM 7 Architecture Trevor Martin CEng, MIEE Technical Director This article gives an overview of the ARM 7 architecture and a description of its major features for a developer new
More information612 CHAPTER 11 PROCESSOR FAMILIES (Corrisponde al cap. 12 - Famiglie di processori) PROBLEMS
612 CHAPTER 11 PROCESSOR FAMILIES (Corrisponde al cap. 12 - Famiglie di processori) PROBLEMS 11.1 How is conditional execution of ARM instructions (see Part I of Chapter 3) related to predicated execution
More informationPentium vs. Power PC Computer Architecture and PCI Bus Interface
Pentium vs. Power PC Computer Architecture and PCI Bus Interface CSE 3322 1 Pentium vs. Power PC Computer Architecture and PCI Bus Interface Nowadays, there are two major types of microprocessors in the
More information2) Write in detail the issues in the design of code generator.
COMPUTER SCIENCE AND ENGINEERING VI SEM CSE Principles of Compiler Design Unit-IV Question and answers UNIT IV CODE GENERATION 9 Issues in the design of code generator The target machine Runtime Storage
More informationChapter 5, The Instruction Set Architecture Level
Chapter 5, The Instruction Set Architecture Level 5.1 Overview Of The ISA Level 5.2 Data Types 5.3 Instruction Formats 5.4 Addressing 5.5 Instruction Types 5.6 Flow Of Control 5.7 A Detailed Example: The
More informationCS352H: Computer Systems Architecture
CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline - Hazards October 1, 2009 University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell Data Hazards in ALU Instructions
More informationPerformance evaluation
Performance evaluation Arquitecturas Avanzadas de Computadores - 2547021 Departamento de Ingeniería Electrónica y de Telecomunicaciones Facultad de Ingeniería 2015-1 Bibliography and evaluation Bibliography
More informationUsing the RDTSC Instruction for Performance Monitoring
Using the Instruction for Performance Monitoring http://developer.intel.com/drg/pentiumii/appnotes/pm1.htm Using the Instruction for Performance Monitoring Information in this document is provided in connection
More informationCSI 333 Lecture 1 Number Systems
CSI 333 Lecture 1 Number Systems 1 1 / 23 Basics of Number Systems Ref: Appendix C of Deitel & Deitel. Weighted Positional Notation: 192 = 2 10 0 + 9 10 1 + 1 10 2 General: Digit sequence : d n 1 d n 2...
More informationEE 261 Introduction to Logic Circuits. Module #2 Number Systems
EE 261 Introduction to Logic Circuits Module #2 Number Systems Topics A. Number System Formation B. Base Conversions C. Binary Arithmetic D. Signed Numbers E. Signed Arithmetic F. Binary Codes Textbook
More informationChapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan
Chapter 2 Basic Structure of Computers Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan Outline Functional Units Basic Operational Concepts Bus Structures Software
More informationComputer Organization and Architecture
Computer Organization and Architecture Chapter 11 Instruction Sets: Addressing Modes and Formats Instruction Set Design One goal of instruction set design is to minimize instruction length Another goal
More informationLearning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design
Learning Outcomes Simple CPU Operation and Buses Dr Eddie Edwards eddie.edwards@imperial.ac.uk At the end of this lecture you will Understand how a CPU might be put together Be able to name the basic components
More informationBinary Numbering Systems
Binary Numbering Systems April 1997, ver. 1 Application Note 83 Introduction Binary numbering systems are used in virtually all digital systems, including digital signal processing (DSP), networking, and
More informationChapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors
Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors Lesson 05: Array Processors Objective To learn how the array processes in multiple pipelines 2 Array Processor
More information150127-Microprocessor & Assembly Language
Chapter 3 Z80 Microprocessor Architecture The Z 80 is one of the most talented 8 bit microprocessors, and many microprocessor-based systems are designed around the Z80. The Z80 microprocessor needs an
More information1. Convert the following base 10 numbers into 8-bit 2 s complement notation 0, -1, -12
C5 Solutions 1. Convert the following base 10 numbers into 8-bit 2 s complement notation 0, -1, -12 To Compute 0 0 = 00000000 To Compute 1 Step 1. Convert 1 to binary 00000001 Step 2. Flip the bits 11111110
More informationCOMP 303 MIPS Processor Design Project 4: MIPS Processor Due Date: 11 December 2009 23:59
COMP 303 MIPS Processor Design Project 4: MIPS Processor Due Date: 11 December 2009 23:59 Overview: In the first projects for COMP 303, you will design and implement a subset of the MIPS32 architecture
More informationAdvanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2
Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of
More informationMaximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
More informationCPU Organisation and Operation
CPU Organisation and Operation The Fetch-Execute Cycle The operation of the CPU 1 is usually described in terms of the Fetch-Execute cycle. 2 Fetch-Execute Cycle Fetch the Instruction Increment the Program
More informationThe string of digits 101101 in the binary number system represents the quantity
Data Representation Section 3.1 Data Types Registers contain either data or control information Control information is a bit or group of bits used to specify the sequence of command signals needed for
More information64-Bit Architecture Speeds RSA By 4x
64-Bit Architecture Speeds RSA By 4x MIPS Technologies, Inc. June 2002 Public-key cryptography, and RSA in particular, is increasingly important to e-commerce transactions. Many digital consumer appliances
More informationComputers. Hardware. The Central Processing Unit (CPU) CMPT 125: Lecture 1: Understanding the Computer
Computers CMPT 125: Lecture 1: Understanding the Computer Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University January 3, 2009 A computer performs 2 basic functions: 1.
More informationELE 356 Computer Engineering II. Section 1 Foundations Class 6 Architecture
ELE 356 Computer Engineering II Section 1 Foundations Class 6 Architecture History ENIAC Video 2 tj History Mechanical Devices Abacus 3 tj History Mechanical Devices The Antikythera Mechanism Oldest known
More informationChapter 2 Logic Gates and Introduction to Computer Architecture
Chapter 2 Logic Gates and Introduction to Computer Architecture 2.1 Introduction The basic components of an Integrated Circuit (IC) is logic gates which made of transistors, in digital system there are
More informationBinary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
More informationCS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of
CS:APP Chapter 4 Computer Architecture Wrap-Up William J. Taffe Plymouth State University using the slides of Randal E. Bryant Carnegie Mellon University Overview Wrap-Up of PIPE Design Performance analysis
More informationData Storage. Chapter 3. Objectives. 3-1 Data Types. Data Inside the Computer. After studying this chapter, students should be able to:
Chapter 3 Data Storage Objectives After studying this chapter, students should be able to: List five different data types used in a computer. Describe how integers are stored in a computer. Describe how
More information