High Performance Computer Architecture Volker Lindenstruth Lehrstuhl für Hochleistungsrechner Archittektur Ruth-Moufang Str. 1 email: ti@compeng.de URL: www.compeng.de Telefon: 798-44100 Volker Lindenstruth (www.compeng.de) 22. April 2010 Copyright, Goethe Uni, Alle Rechte vorbehalten
Goals for the course In-depth understanding of the architecture and design of modern high performance computers and their efficient programming technology forces fundamental architectural issues» naming, replication, communication, synchronization basic design techniques» cache coherence, protocols, networks, pipelining, methods of evaluation underlying engineering trade-offs Programming models and methods from moderate to very large scale across the hardware/software boundary learn to use parallel computer (projects in class) learn using MP and SAS programming models Volker Lindenstruth (www.compeng.de) 22. April 2010 Copyright, Goethe Uni, Alle Rechte vorbehalten L00-2
Contents Fundamentals and Introduction Why Parallel Architecture; Evolution of Parallel Machines; Parallel Software Basics; Programming for Performance Scaling Parallel Programs for Multiprocessors Vectorization, Methodology and Examples; Working Sets, Cache Sizes, and Node Granularity Issues for Large-Scale Multiprocessors; Workload-Driven Architectural Evaluation; Scaling Small-Scale Shared Memory Cache Coherence; Memory Consistency; Snooping Protocols; Synchronization; Design Tradeoffs; Implementation Large-Scale Scalable Distributed-Memory Multiprocessors Realizing Programming Models on Large-Scale Distributed-Memory Multiprocessors; Desing of Large-Scale Distributed-Memory Multiprocessors; Architecture of Intel Paragon; Desing of Large-Scale Shared Physical Address Space; Architecture of T3D; Large-Scale Shared Address Space Multiprocessors; Memory Consistency Models; Large-scale CC Designs; Case Studies: Large Scale CC-NUMA Machines, COMA Latency Tolerance In message passing and distributed shared memory; block data transfers; long latency events; precommunication in SAS; multithreadding Scalable Interconnection Networks Design Space of Interconnection Networks; Routing; Synchronization; Case Studies: Myrinet, SCI, Reflective Memories Cluster Computing Applications, Distributed mass storage, fault tolerance, autonomous computing Volker Lindenstruth (www.compeng.de) 22. April 2010 Copyright, Goethe Uni, Alle Rechte vorbehalten L00-3
Literature In preperation for this course the following bucks have been used: David Culler and J.P. Singh with Anoop Gupta: Parallel Computer Architecture: A Hardware/Software Approach Morgan Kaufmann Publishers, Inc, ISBN 1-55860-343-3 G. Coulouris, et al, Distributed Systems, 3rd ed., Addison Wesley, 2001 A. Tanenbaum, M. v. Steen, Distributed Systems, Prentice Hall, 2002 N.A. Lynch, Distributed Algorithms, Morgan Kaufmann Publ., 1996 R. Guerraoui, L. Rodrigues, Introduction to Reliable Distributed Programming, Springer, 2006 G. Weikum, G. Vossen, Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control, Morgan Kaufmann Publ. John L. Hennesey, David Patterson: Computer Architecture a Quantitative Approach ISBN 1-55880- 069-8 Volker Lindenstruth (www.compeng.de) 22. April 2010 Copyright, Goethe Uni, Alle Rechte vorbehalten L00-4
Acknowledgement This lecture is based on the book and corresponding course by Prof. Dr. David Culler, UC Berkeley. The vast majority of slides and course material has been borrowed from his course. Additional material has been taken from Prof. Dr. Alexander Reinefeld, ZIB Berlin. Further contributors are Mathias Bach, Mathias Kretz and other members of the chair. Volker Lindenstruth (www.compeng.de) 22. April 2010 Copyright, Goethe Uni, Alle Rechte vorbehalten L00-5