CS1101: Lecture 14. Computer Systems Organization: Processors. Lecture Outline. Design Principles for Modern Computers

Similar documents
A Lab Course on Computer Architecture

Chapter 2 Logic Gates and Introduction to Computer Architecture

Chapter 2 Basic Structure of Computers. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

LSN 2 Computer Processors

Central Processing Unit (CPU)

CISC, RISC, and DSP Microprocessors

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS

Let s put together a Manual Processor

Computer Architecture TDTS10

CHAPTER 7: The CPU and Memory

İSTANBUL AYDIN UNIVERSITY

Quiz for Chapter 1 Computer Abstractions and Technology 3.10

(Refer Slide Time: 00:01:16 min)

Chapter 5 Instructor's Manual

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May ILP Execution

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

Addressing The problem. When & Where do we encounter Data? The concept of addressing data' in computations. The implications for our machine design(s)

PROBLEMS #20,R0,R1 #$3A,R2,R4

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

PROBLEMS. which was discussed in Section

Week 1 out-of-class notes, discussions and sample problems

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Chapter 1 Computer System Overview

on an system with an infinite number of processors. Calculate the speedup of

Microprocessor & Assembly Language

Router Architectures

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors

Central Processing Unit

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

The Central Processing Unit:

Recommended hardware system configurations for ANSYS users

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Intel 8086 architecture

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Memory management basics (1) Requirements (1) Objectives. Operating Systems Part of E1.9 - Principles of Computers and Software Engineering

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

Introduction to Computer Architecture Concepts

Computer Systems Design and Architecture by V. Heuring and H. Jordan

Introduction to Cloud Computing

VHDL DESIGN OF EDUCATIONAL, MODERN AND OPEN- ARCHITECTURE CPU

Real-Time Monitoring Framework for Parallel Processes

Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Thread level parallelism

Introducción. Diseño de sistemas digitales.1

Introduction to GPU Programming Languages

Generations of the computer. processors.

CS550. Distributed Operating Systems (Advanced Operating Systems) Instructor: Xian-He Sun

VLIW Processors. VLIW Processors

Chapter 4 Register Transfer and Microoperations. Section 4.1 Register Transfer Language

Processor Architectures

Using Power to Improve C Programming Education

Instruction Set Architecture (ISA)

CHAPTER 4 MARIE: An Introduction to a Simple Computer

A s we saw in Chapter 4, a CPU contains three main sections: the register section,

CPU Organisation and Operation

361 Computer Architecture Lecture 14: Cache Memory


CPU Organization and Assembly Language

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Choosing a Computer for Running SLX, P3D, and P5

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design

How To Understand The Design Of A Microprocessor

Homework # 2. Solutions. 4.1 What are the differences among sequential access, direct access, and random access?

Virtual machine interface. Operating system. Physical machine interface

TDT 4260 lecture 11 spring semester Interconnection network continued

Computer Organization

ARMSim: An Instruction-Set Simulator for the ARM processor Alpa Shah - Columbia University

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

Lecture 2 Parallel Programming Platforms

Instruction Set Architecture (ISA) Design. Classification Categories

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

SOC architecture and design

MICROPROCESSOR AND MICROCOMPUTER BASICS

Design and Implementation of the Heterogeneous Multikernel Operating System

MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN.

GPU Computing with CUDA Lecture 2 - CUDA Memories. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

An Introduction to the ARM 7 Architecture

What is a bus? A Bus is: Advantages of Buses. Disadvantage of Buses. Master versus Slave. The General Organization of a Bus

ELE 356 Computer Engineering II. Section 1 Foundations Class 6 Architecture

Chapter 2 Parallel Computer Architecture

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit

~ Greetings from WSU CAPPLab ~

Lecture 5: Gate Logic Logic Optimization

Computer Systems Structure Input/Output

CHAPTER 2: HARDWARE BASICS: INSIDE THE BOX

Chapter 1: Introduction. What is an Operating System?

Computer Organization. and Instruction Execution. August 22

Computer organization

AC : A PROCESSOR DESIGN PROJECT FOR A FIRST COURSE IN COMPUTER ORGANIZATION

1. Memory technology & Hierarchy

Computer Organization and Architecture

Exception and Interrupt Handling in ARM

Spacecraft Computer Systems. Colonel John E. Keesee

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup

Transcription:

CS1101: Lecture 14 Computer Systems Organization: Processors & Parallelism Dr. Barry O Sullivan b.osullivan@cs.ucc.ie Lecture Outline Design Principles for Modern Computers Parallelism -Level Parallelism Pipelining Dual Pipelines Superscalar Architectures Processor-Level Parallelism Array Computers Multiprocessors Multicomputers Course Homepage http://www.cs.ucc.ie/ osullb/cs1101 Reading: Tanenbaum, Chapter 2 Section 1 Department of Computer Science, University College Cork Department of Computer Science, University College Cork 1 Design Principles for Modern Computers Design Principles for Modern Computers There is a set of design principles, sometimes called the RISC design principles, that architects of general-purpose CPUs do their best to follow: All s Are Directly Executed by Hardware eliminates a level of interpretation Maximise the Rate at Which s are Issued MIPS = millions of instructions per second MIPS speed related to the number of instructions issued per second Parallelism can play a role s Should be Easy to Decode a critical limit on the rate of issue of instructions make instructions regular, fixed length, with a small number of fields. the fewer different formats for instructions. the better. Only Loads and Stores Should Reference Memory operands for most instructions should come from- and return to- registers. access to memory can take a long time thus, only LOAD and STORE instructions should reference memory. Provide Plenty of Registers accessing memory is relatively slow, many registers (at least 32) need to be provided, so that once a word is ed, it can be kept in a register until it is no longer needed. Department of Computer Science, University College Cork 2 Department of Computer Science, University College Cork 3

Parallelism -Level Parallelism Computer architects are constantly striving to improve performance of the machines they design. Making the chips run faster by increasing their clock speed is one way, Parallelism is exploited within individual instructions to get more instructions/sec out of the machine. We will consider two approached Pipelining Superscalar Architectures However, most computer architects look to parallelism (doing two or more things at once) as a way to get even more performance for a given clock speed. Parallelism comes in two general forms: instruction-level parallelism, and processor-level parallelism. Department of Computer Science, University College Cork 4 Department of Computer Science, University College Cork 5 Pipelining A Example of Pipelining S1 S2 S3 S4 S5 Fetching of instructions from memory is a major bottleneck in instruction speed. However, computers have the ability to instructions from memory in advance These instructions were stored in a set of registers called the pre buffer. Thus, instruction is divided into two parts: ing and actual ; The concept of a pipeline carries this strategy much further. S1: S2: S3: S4: S5: (a) 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 1 2 3 4 5 6 1 2 3 4 5 1 2 3 4 5 6 7 8 9 Time (b) Figure 2-4. (a) A five-stage pipeline. (b) The state of each stage as a function of time. Nine clock cycles are illustrated. Instead of dividing instruction into only two parts, it is often divided into many parts, each one handled by a dedicated piece of hardware, all of which can run in parallel. Department of Computer Science, University College Cork 6 Department of Computer Science, University College Cork 7

Dual Pipelines Example: Dual Pipelines If one pipeline is good, then surely two pipelines are better. Here a single instruction es pairs of instructions together and puts each one into its own pipeline, complete with its own ALU for parallel operation. S1 S2 S3 S4 S5 Figure 2-5. (a) Dual five-stage pipelines with a common instruction. To be able to run in parallel, the two instructions must not conflict over resource usage (e.g., registers), and neither must depend on the result of the other. Department of Computer Science, University College Cork 8 Department of Computer Science, University College Cork 9 Superscalar Architectures Superscalar Architectures Going to four pipelines is conceivable, but doing so duplicates too much hardware S4 ALU Instead, a different approach is used on highend CPUs. S1 S2 S3 S5 ALU LOAD The basic idea is to have just a single pipeline but give it multiple functional s. This is a superscalar architecture using more than one ALU, so that more than one instruction can be executed in parallel. Implicit in the idea of a superscalar processor is that the S3 stage can issue instructions considerably faster than the S4 stage is able to execute them. STORE Floating point Figure 2-6. A superscalar processor with five functional s. Department of Computer Science, University College Cork 10 Department of Computer Science, University College Cork 11

Processor-Level Parallelism Array Computers -level parallelism (pipelining and superscalar operation) rarely win more than a factor of five or ten in processor speed. To get gains of 50, 100, or more, the only way is to design computers with multiple CPUS We will consider three alternative architectures: Array Computers Multiprocessors Multicomputers An array processor consists of a large number of identical processors that perform the same sequence of instructions on different sets of data. A vector processor is efficient at at executing a sequence of operations on pairs of Data elements; all of the addition operations are performed in a single, heavily-pipelined adder. Department of Computer Science, University College Cork 12 Department of Computer Science, University College Cork 13 Example: Array Computers Multiprocessors Control Broadcasts instructions The processing elements in an array processor are not independent CPUS, since there is only one control. Processor 8 8 Processor/memory grid The first parallel system with multiple full-blown CPUs is the multiprocessor. Memory Figure 2-7. An array processor of the ILLIAC IV type. This is a system with more than one CPU sharing a common memory co-ordinated in software. The simplest one is to have a single bus with multiple CPUs and one memory all plugged into it. Department of Computer Science, University College Cork 14 Department of Computer Science, University College Cork 15

Example: Multiprocessors Multicomputers CPU CPU CPU CPU (a) Shared memory Bus CPU Local memories CPU CPU CPU Figure 2-8. (a) A single-bus multiprocessor. (b) A multicomputer with local memories. (b) Shar memo Bus Although multiprocessors with a small number of processors (< 64) are relatively easy to build, large ones are surprisingly difficult to construct. The difficulty is in connecting all the processors to the memory. To get around these problems, many designers have simply abandoned the idea of having a shared memory and just build systems consisting of large numbers of interconnected computers, each having its own private memory, but no common memory. These systems are called multicomputers. Department of Computer Science, University College Cork 16 Department of Computer Science, University College Cork 17