Annotation to the assignments and the solution sheet. Note the following points



Similar documents
Scalability and Classifications

Topological Properties

Lecture 2 Parallel Programming Platforms

Lecture 23: Multiprocessors

Vorlesung Rechnerarchitektur 2 Seite 178 DASH

Interconnection Network

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook)

Chapter 2 Parallel Architecture, Software And Performance

Introduction to Parallel Computing. George Karypis Parallel Programming Platforms

Components: Interconnect Page 1 of 18

Lecture 23: Interconnection Networks. Topics: communication latency, centralized and decentralized switches (Appendix E)

Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)

Introduction to Cloud Computing

Computer Architecture TDTS10

Why the Network Matters

System Interconnect Architectures. Goals and Analysis. Network Properties and Routing. Terminology - 2. Terminology - 1

Chapter 12: Multiprocessor Architectures. Lesson 04: Interconnect Networks

Parallel Programming

Lecture 18: Interconnection Networks. CMU : Parallel Computer Architecture and Programming (Spring 2012)

Interconnection Networks. Interconnection Networks. Interconnection networks are used everywhere!

Systolic Computing. Fundamentals

Parallel Architectures and Interconnection

Intel Itanium Quad-Core Architecture for the Enterprise. Lambert Schaelicke Eric DeLano

UNIT 2 CLASSIFICATION OF PARALLEL COMPUTERS

CMSC 611: Advanced Computer Architecture

Some Computer Organizations and Their Effectiveness. Michael J Flynn. IEEE Transactions on Computers. Vol. c-21, No.

Interconnection Network Design

OC By Arsene Fansi T. POLIMI

Chapter 2. Multiprocessors Interconnection Networks

Multiprocessor Cache Coherence

High Performance Computing, an Introduction to

Machine Architecture and Number Systems. Major Computer Components. Schematic Diagram of a Computer. The CPU. The Bus. Main Memory.

Interconnection Networks

Principles and characteristics of distributed systems and environments

ELE 356 Computer Engineering II. Section 1 Foundations Class 6 Architecture

The Big Picture. Cache Memory CSE Memory Hierarchy (1/3) Disk

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

A Lab Course on Computer Architecture

Performance Evaluation of 2D-Mesh, Ring, and Crossbar Interconnects for Chip Multi- Processors. NoCArc 09

Interconnection Networks

UNIVERSITY OF CALIFORNIA, DAVIS Department of Electrical and Computer Engineering. EEC180B Lab 7: MISP Processor Design Spring 1995

Middleware and Distributed Systems. Introduction. Dr. Martin v. Löwis

Computer Architecture

Multi-Threading Performance on Commodity Multi-Core Processors

Chapter 07: Instruction Level Parallelism VLIW, Vector, Array and Multithreaded Processors. Lesson 05: Array Processors

Embedded Parallel Computing

COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING

Computer Organization and Components

An Introduction to Parallel Computing/ Programming

IA-64 Application Developer s Architecture Guide

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

On-Chip Interconnection Networks Low-Power Interconnect

PROBLEMS #20,R0,R1 #$3A,R2,R4

Architectural Level Power Consumption of Network on Chip. Presenter: YUAN Zheng

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

Chapter 12: Multiprocessor Architectures. Lesson 09: Cache Coherence Problem and Cache synchronization solutions Part 1

Operating Systems CSE 410, Spring File Management. Stephen Wagner Michigan State University

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

Architecture of Hitachi SR-8000

Read this before starting!

2) What is the structure of an organization? Explain how IT support at different organizational levels.

LSN 2 Computer Processors

High Performance Computing in the Multi-core Area

SOC architecture and design

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level

Software Pipelining. for (i=1, i<100, i++) { x := A[i]; x := x+1; A[i] := x

MICROPROCESSOR. Exclusive for IACE Students iacehyd.blogspot.in Ph: /422 Page 1

Load balancing in a heterogeneous computer system by self-organizing Kohonen network

Fault-Tolerant Routing Algorithm for BSN-Hypercube Using Unsafety Vectors

Multicore Processor and GPU. Jia Rao Assistant Professor in CS

Distributed communication-aware load balancing with TreeMatch in Charm++

High Performance Computing. Course Notes HPC Fundamentals

Parallel Computing. Benson Muite. benson.

LIST OF FIGURES. Figure No. Caption Page No.

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design

Interconnection Networks

CHAPTER 7: The CPU and Memory

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

SAN Conceptual and Design Basics

Module: Software Instruction Scheduling Part I

A Locally Cache-Coherent Multiprocessor Architecture

Virtual Shared Memory (VSM)

Study Plan Masters of Science in Computer Engineering and Networks (Thesis Track)

Technical Note. Micron NAND Flash Controller via Xilinx Spartan -3 FPGA. Overview. TN-29-06: NAND Flash Controller on Spartan-3 Overview

Chapter 2 Parallel Computer Architecture

Instruction Set Design

Data Memory Alternatives for Multiscalar Processors

A Very Brief History of High-Performance Computing

Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two

Performance Analysis and Optimization Tool

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

EECS 583 Class 11 Instruction Scheduling Software Pipelining Intro

Introduction to GPU Programming Languages

Processor Architectures

Installation Guide for Dolphin PCI-SCI Adapters

Transcription:

Computer rchitecture 2 / dvanced Computer rchitecture Seite: 1 nnotation to the assignments and the solution sheet This is a multiple choice examination, that means: Solution approaches are not assessed For each subpart of an assignment one or more answers can be right But: If you mark the box "None of them" of one subpart, the other marked answers of this subpart will be disregarded It is not possible to get a negative score in a subpart of any assignment Note the following points In addition to the assignment sheet there is a solution sheet Mark the answers on the solution sheet as described!!! MRKED NSWERS ON THE SSIGNMENT SHEET WILL NOT BE CONSIDERED You get the assignment sheet only once In case of erroneous entries ask the personnel for a new solution sheet Only use the sheets enclosed in the envelop Don't use any other paper If you need more paper ask the supervisors Return everything, ie assignment sheet, solution sheet and the sheets - used and unused Only exams that are returned completely will be assessed FILL-IN YOUR NME ND MTRICULTION NUMBER ON THE SSIGNMENT SHEET ND THE SOLUTION SHEET!

Computer rchitecture 2 / dvanced Computer rchitecture Seite: 2 Question 1 (14 Points) Parallelism within a Processor 11 Which of the following statements about the von Neumann architecture is/are true? : Programs and data are resident in different memories B: The computer structure is independent of the problem to be processed C: Programs consist of a sequence of instructions which are executed in parallel D: The machine applies binary codes E: None of the answers above is correct 12 Instruction Pipelining: How long (in ns) is the gap (bubble) within the fourth task entering the pipe below? IF I E MEM WB 4 ns 3 D ns 4 Xns 8 ns 3 ns F: 12 ns G: 16 ns H: 20 ns I: None of the answers above is correct 13 Pipelining: what is the execution time per stage of a pipeline that has 5 equal stages and a mean overhead of 8 cycles? J: 2 cycles K: 3 cycles L: 4 cycles M: None of the answers above is correct 14 Itanium processor, ILP (EPIC): vector operation c = a + b with 154 elements per vector shall be performed How many cycles are required within the loop below for the vector operation above (neglect the branch operation brctop ) if the load (ldl) instructions take two cycles and the remaining operations take 1 cycle?

Computer rchitecture 2 / dvanced Computer rchitecture Seite: 3 Intel s Itanium ld r2=addr(a) ld r3=addr(b) ;; ld r4=addr(c) ldlc=4 ldec=5 ;; loop: (p16) ldl f32=[r2],8 (p17) ldl f36=[r3],8 (p19) fadd f38=f35+f38 (p20) stl [r4]=f39,8 brctoploop ;; N: 158 O: 159 P: 162 Q: None of the answers above is correct 15 Which feature of Itanium processors aims to increase parallelism by changing instructions order? R: Rotating Registers S: Predication T: Speculation U: None of the answers above is correct

Computer rchitecture 2 / dvanced Computer rchitecture Seite: 4 Question 2 (12 Points) Classification & Performance of Parallel rchitectures 21 Which kind of architecture is represented by the following figure? IS I/O CU 1 IS PU 1 DS IS I/O CU 2 IS PU 2 DS Shared Memory I/O CU n IS PU n DS IS : SISD architecture B: SIMD architecture C: MIMD architecture D: MISD architecture E: None of the answers above is correct 22 Which statement(s) related to the system in figure in 21 is/are true? F: The system is very well scalable with respect to the number of processors G: The system represents a vector processor H: 2 The processors can communicate with each others through shared variables I: None of the answers above is correct 23 Parallel programs: Which is the parallel execution time of a program with mean parallel overhead 4 s and sequential execution time 600 s on 150 processors? J: 4 s K: 8 s L: 12 s M: N: None of the answers above is correct

Computer rchitecture 2 / dvanced Computer rchitecture Seite: 5 24 Parallel programs: Which is the execution time of a program on 100 processors if 93% of the program is ideally parallel, the remaining part is sequential and the sequential execution time is 10000 s? O: 100 s P: 593 s Q: 793 s R: None of the answers above is correct 25 Workload driven evaluation of parallel systems, memory constrained scaling: matrix factorization with complexity n³ takes 20 hours for a square matrix which requires 128*10 8 bytes on one processor (8 bytes per element) Which time would it need on 100 processors (assuming 50% parallel efficiency)? S: 200 hours T: 400 hours U: 600 hours V: None of the answers above is correct 26 Workload driven evaluation of parallel systems, time-constrained scaling: Which should be the number of rows for a matrix-matrix multiplication on 1 processor if it is 3000 on 30 processors (assuming 90% parallel efficiency)? W: 1000 X: 1500 Y: 2000 Z: None of the answers above is correct

Computer rchitecture 2 / dvanced Computer rchitecture Seite: 6 Question 3 (12 Points) Interconnection Networks 31 Topology: What is the difference between a 2-D torus and a hypercube with 16 nodes regarding the topology parameters node degree, diameter, bisection width, and average distance? : The hypercube has the higher bisection width B: The node degree is different C: The 2-D torus has the higher average distance D: No difference E: None of the answers above is correct 32 E-cube routing: Which is the path taken from 010 to 101? 110 111 010 011 100 101 000 001 F: 010 -> 011 -> 001 -> 101 G: 010 -> 110 -> 100 -> 101 H: 010 -> 000 -> 001 -> 101 I: 010 -> 110 -> 111 -> 101 J: None of the answers above is correct 33 Topology: Which is the height of a binary tree with 128 nodes? K: 8 L: 7 M: 6 N: None of the answers K-M is correct 34 Which routing strategies are deadlock-free? O: E-cube routing on hypercubes P: XY routing on tori Q: XY routing on 2D meshes R: None of the answers above is correct

Computer rchitecture 2 / dvanced Computer rchitecture Seite: 7 35 Topology: Which is the average distance in a butterfly network with 256 nodes? S: 16 T: 4 U: 8 V: None of the answers S-U is correct 36 Routing in a butterfly network: Which statement is true? W: Each stage corresponds to a bit in the destination address X: The corresponding bit of the destination address selects the output of each stage (0 or 1) Y: The corresponding bit of the destination address selects the input of each stage (0 or 1) Z: None of the answers above is correct

Computer rchitecture 2 / dvanced Computer rchitecture Seite: 8 Question 4 (9 Points) Caches 41 Simple cache model, 1 level only: Which is the cache access time if the access time from the processor view is 5 ns, the hit rate is 99% and the cache access time is 1/400 of the memory access time? : 2 ns B: 1 ns C: 3 ns D: None of the answers above is correct 42 Cache coherence: For which shared (virtual) memory systems is the snooping protocol not suited? E: Systems with butterfly network F: Bus based systems G: Systems with 3-D torus network H: None of the answers above is correct 43 Snooping cache protocol: In which cases is the main memory up-to-date? I: Write-back caches: Cache data marked as exclusive J: Write-back caches: Cache data marked as modified K: Write-through caches: fter writing to shared data L: None of the answers above is correct 44 Snooping cache protocol, write-back caches: What is not an immediate effect of writing to shared data in the cache of one processor? M: Updating copies in the caches of other processors N: Invalidating copies in the caches of other processors O: Updating main memory P: None of the answers above is correct

Computer rchitecture 2 / dvanced Computer rchitecture Seite: 9 45 Directory-based cache coherence protocols for distributed memory systems: Which information is not necessary in the directory of each processor? Q: Status information on data in memory of other processors R: Locations of copies of the processor s cache data S: Status information on the processor s cache data T: Status information on the processor s cache data + locations of copies U: None of the answers above is correct