Computer rchitecture 2 / dvanced Computer rchitecture Seite: 1 nnotation to the assignments and the solution sheet This is a multiple choice examination, that means: Solution approaches are not assessed For each subpart of an assignment one or more answers can be right But: If you mark the box "None of them" of one subpart, the other marked answers of this subpart will be disregarded It is not possible to get a negative score in a subpart of any assignment Note the following points In addition to the assignment sheet there is a solution sheet Mark the answers on the solution sheet as described!!! MRKED NSWERS ON THE SSIGNMENT SHEET WILL NOT BE CONSIDERED You get the assignment sheet only once In case of erroneous entries ask the personnel for a new solution sheet Only use the sheets enclosed in the envelop Don't use any other paper If you need more paper ask the supervisors Return everything, ie assignment sheet, solution sheet and the sheets - used and unused Only exams that are returned completely will be assessed FILL-IN YOUR NME ND MTRICULTION NUMBER ON THE SSIGNMENT SHEET ND THE SOLUTION SHEET!
Computer rchitecture 2 / dvanced Computer rchitecture Seite: 2 Question 1 (14 Points) Parallelism within a Processor 11 Which of the following statements about the von Neumann architecture is/are true? : Programs and data are resident in different memories B: The computer structure is independent of the problem to be processed C: Programs consist of a sequence of instructions which are executed in parallel D: The machine applies binary codes E: None of the answers above is correct 12 Instruction Pipelining: How long (in ns) is the gap (bubble) within the fourth task entering the pipe below? IF I E MEM WB 4 ns 3 D ns 4 Xns 8 ns 3 ns F: 12 ns G: 16 ns H: 20 ns I: None of the answers above is correct 13 Pipelining: what is the execution time per stage of a pipeline that has 5 equal stages and a mean overhead of 8 cycles? J: 2 cycles K: 3 cycles L: 4 cycles M: None of the answers above is correct 14 Itanium processor, ILP (EPIC): vector operation c = a + b with 154 elements per vector shall be performed How many cycles are required within the loop below for the vector operation above (neglect the branch operation brctop ) if the load (ldl) instructions take two cycles and the remaining operations take 1 cycle?
Computer rchitecture 2 / dvanced Computer rchitecture Seite: 3 Intel s Itanium ld r2=addr(a) ld r3=addr(b) ;; ld r4=addr(c) ldlc=4 ldec=5 ;; loop: (p16) ldl f32=[r2],8 (p17) ldl f36=[r3],8 (p19) fadd f38=f35+f38 (p20) stl [r4]=f39,8 brctoploop ;; N: 158 O: 159 P: 162 Q: None of the answers above is correct 15 Which feature of Itanium processors aims to increase parallelism by changing instructions order? R: Rotating Registers S: Predication T: Speculation U: None of the answers above is correct
Computer rchitecture 2 / dvanced Computer rchitecture Seite: 4 Question 2 (12 Points) Classification & Performance of Parallel rchitectures 21 Which kind of architecture is represented by the following figure? IS I/O CU 1 IS PU 1 DS IS I/O CU 2 IS PU 2 DS Shared Memory I/O CU n IS PU n DS IS : SISD architecture B: SIMD architecture C: MIMD architecture D: MISD architecture E: None of the answers above is correct 22 Which statement(s) related to the system in figure in 21 is/are true? F: The system is very well scalable with respect to the number of processors G: The system represents a vector processor H: 2 The processors can communicate with each others through shared variables I: None of the answers above is correct 23 Parallel programs: Which is the parallel execution time of a program with mean parallel overhead 4 s and sequential execution time 600 s on 150 processors? J: 4 s K: 8 s L: 12 s M: N: None of the answers above is correct
Computer rchitecture 2 / dvanced Computer rchitecture Seite: 5 24 Parallel programs: Which is the execution time of a program on 100 processors if 93% of the program is ideally parallel, the remaining part is sequential and the sequential execution time is 10000 s? O: 100 s P: 593 s Q: 793 s R: None of the answers above is correct 25 Workload driven evaluation of parallel systems, memory constrained scaling: matrix factorization with complexity n³ takes 20 hours for a square matrix which requires 128*10 8 bytes on one processor (8 bytes per element) Which time would it need on 100 processors (assuming 50% parallel efficiency)? S: 200 hours T: 400 hours U: 600 hours V: None of the answers above is correct 26 Workload driven evaluation of parallel systems, time-constrained scaling: Which should be the number of rows for a matrix-matrix multiplication on 1 processor if it is 3000 on 30 processors (assuming 90% parallel efficiency)? W: 1000 X: 1500 Y: 2000 Z: None of the answers above is correct
Computer rchitecture 2 / dvanced Computer rchitecture Seite: 6 Question 3 (12 Points) Interconnection Networks 31 Topology: What is the difference between a 2-D torus and a hypercube with 16 nodes regarding the topology parameters node degree, diameter, bisection width, and average distance? : The hypercube has the higher bisection width B: The node degree is different C: The 2-D torus has the higher average distance D: No difference E: None of the answers above is correct 32 E-cube routing: Which is the path taken from 010 to 101? 110 111 010 011 100 101 000 001 F: 010 -> 011 -> 001 -> 101 G: 010 -> 110 -> 100 -> 101 H: 010 -> 000 -> 001 -> 101 I: 010 -> 110 -> 111 -> 101 J: None of the answers above is correct 33 Topology: Which is the height of a binary tree with 128 nodes? K: 8 L: 7 M: 6 N: None of the answers K-M is correct 34 Which routing strategies are deadlock-free? O: E-cube routing on hypercubes P: XY routing on tori Q: XY routing on 2D meshes R: None of the answers above is correct
Computer rchitecture 2 / dvanced Computer rchitecture Seite: 7 35 Topology: Which is the average distance in a butterfly network with 256 nodes? S: 16 T: 4 U: 8 V: None of the answers S-U is correct 36 Routing in a butterfly network: Which statement is true? W: Each stage corresponds to a bit in the destination address X: The corresponding bit of the destination address selects the output of each stage (0 or 1) Y: The corresponding bit of the destination address selects the input of each stage (0 or 1) Z: None of the answers above is correct
Computer rchitecture 2 / dvanced Computer rchitecture Seite: 8 Question 4 (9 Points) Caches 41 Simple cache model, 1 level only: Which is the cache access time if the access time from the processor view is 5 ns, the hit rate is 99% and the cache access time is 1/400 of the memory access time? : 2 ns B: 1 ns C: 3 ns D: None of the answers above is correct 42 Cache coherence: For which shared (virtual) memory systems is the snooping protocol not suited? E: Systems with butterfly network F: Bus based systems G: Systems with 3-D torus network H: None of the answers above is correct 43 Snooping cache protocol: In which cases is the main memory up-to-date? I: Write-back caches: Cache data marked as exclusive J: Write-back caches: Cache data marked as modified K: Write-through caches: fter writing to shared data L: None of the answers above is correct 44 Snooping cache protocol, write-back caches: What is not an immediate effect of writing to shared data in the cache of one processor? M: Updating copies in the caches of other processors N: Invalidating copies in the caches of other processors O: Updating main memory P: None of the answers above is correct
Computer rchitecture 2 / dvanced Computer rchitecture Seite: 9 45 Directory-based cache coherence protocols for distributed memory systems: Which information is not necessary in the directory of each processor? Q: Status information on data in memory of other processors R: Locations of copies of the processor s cache data S: Status information on the processor s cache data T: Status information on the processor s cache data + locations of copies U: None of the answers above is correct