Lessons Learned so far: Lecture 2 : Advanced Processors Part 1: The Road Map of Intel Microprocessors: Current and Future Trends Kai Hwang, August 31, 2007 Evolution of Instruction Set Architectures : CISC to RISC and to Hybrid Amdhal s Law : Making the Common Case Fast Moore s Law: Scalable Commodity Computing Network-based computing booming: Cluster and Grid Computing 1 2 How to Improve Computer Performance with Architectural Innovations? Use RISC Instruction Set Architecture Use high density chip with billion of transistors Use multi- GHz clocks with better cooling support Use deep pipelined design of processor datapath Use superscalar processor with multiple datapaths Use out-of-order or dynamic execution How to Improve Computer Performance with Architectural Innovations? (Cont d) Use hybrid architecture like Pentium series Use low-power and better power management (Core duo) Use multi-level cache and shared memory architecture Use branch prediction technique Use multi-core architecture with multiple threads of control like the Niagara Processor 3 4
Top 500 List of Supercomputers Single Instruction multiple data (SIMD) Myrias 1991 Supercomputer Industry? (Dinosaurs Vanished ) BBN 1997 500 Cluster (network of workstations) 400 300 Cluster (network of SMPs) Multiflow 1990 ESCD 1990 Convex Computer 1994 Kendall Square Resarch 1996 MasPar 1996 Cray Research 1996 ncube 2005 200 Massively parallel processors (MPPs) 1985 1990 1992 1994 1996 1998 2000 2005 100 0 93 93 94 94 95 95 96 96 97 97 98 98 99 99 00 Sharedmemory multiprocessors (SMPs) 1989 ETA 1992 Meiko Scientific 1994 Thinking Machines 1995 Pyramid 1998 DEC 1999 Sequent Uniprocessors 5 6 Intel Microprocessors since The Pentium Series Pentium 4 Architecture Processor executes simple microinstructions, 70 bits wide (hardwired) 120 control lines for integer datapath (400 control lines for floating point) If an instruction requires more than 4 microinstructions to implement, control from microcode ROM (8000 microinstructions) Control Control I/O interface Instruction cache Enhanced floating point and multimedia Control Data cache Integer datapath Secondary cache and memory interface Advanced pipelining hyperthreading support Control 7 8
The Pentium 4 Architecture Interesting Features in Pentium 4 A special trace cache is used to hold predecoded IA-32 instructions, which are translated into RISC-like microoperations for out-of-order execution Allows up to 126 microoperations to be outstanding in deep-pipelined execution queues, including 48 loads, 24 stores, and integer and floating-point operations over 7 functional units, simultaneously. The FP units also handles the MMX (multimedia extension) ans SSE2 instructions. The functional pipelines execute dynamically with multiple issues Other interesting features include ILP, speculation, register renaming, dynamic scheduling for OOO execution, and inorder retirement, etc. 9 10 Innovative Processor Features in Modern Processors: Use data or instruction prefetching techniques Use speculative load and speculative branch ahead of store instructions Use predicated instruction execution Use EPIC (Explicit Parallelism Instruction Computing) architecture like the Itanium (IA-64 architecture) Innovative Processor Features in Modern Processors: Compiler optimization from small basic blocks to bigger blocks Extending compiler from local to global optimizations System In a Chip (SIC) architecture Chip MultiProcessor (CMP) architecture 11 12
Advances in IA-64 Architecture beyond The IA-32 : Enlarged memory address space from 2 32 words to 2 64 double words. Use 256 physical registers over 64 registers Use speculative load beyond store and speculative branch Use predicated instruction execution 13 14 HP/Intel Itanium Series Use EPIC (Explicit Parallelism Instruction Computing) to merge superscalar and VLIW features Compiler optimization extending small basic blocks to much bigger blocks Extending compiler from local to global optimizations 15 1
17 18 Intel Wide Dynamic Execution: Multi-Core The Core Duo Processor Architecture Intel Core Duo Processor Floor plan: To provide the best performance under Power and thermal constraints. To achieve a significant improvement in performance by a punctual control between the performance, power, and thermal features in the Core Duo system. 19 20
Power States of Intel Core Duo Processor Optimize the power utilization for sustained performance : Improved power process technology Low power CPU design Dynamic power coordination Low Vcc cache design with voltage regulation Smart cache sizing and enhanced deeper sleep Using thermal monitor and digital sensor 21 22 Smart Cache in Intel Core Duo A Multiprocessor using separate CPU Chips on the same circuit board 23 24
A Chip Multiprocessor (CMP) Intel s Core 2 Duo Architecture Further integration on denser chips 25 26 Intel s Core Fetch/Decode Hardware Required Paper Readings (download from class web site, for Lectures 1 2 : O. Wechsler, Inside Intel Core Microarchitecture, Intel White Paper, 2006 H. Sharangpani, Intel Itanium Processor Micro-architecture Overview, Intel Presentation Slides, 2001. S. Gochman, et al, Introduction to Intel Core Duo Processor Architecture, Intel Technology Journal, Vol.10, Issue 2, 2006 27 28
Book Chapter Readings related to Lectures 1 8 : I. Taylor, From P2P to Web Services and Grids, Springer- Verlag, Chapter 1, 2, and 4 on Introduction, P2P systems, and Grid Computing London, 2005, ISBN 1-85233-869-5 K. Hwang and Z. Xu: Scalable Parallel Computing, Chapters 8, 9, and 10 on Scalable Multiprocessors and Clusters McGraw-Hill, 1998, ISBN: 0-07-031798-4 More Advanced Processors plus Network Technology to be studied in next Lecture 2 : Intel s Montecito A dual-core successor of Itanium Series Sun s Niagara Series IBM POWER 5 Series HP Piranha- A CMP Architecture LAN, SAN, and NAS Technologies 29 30