CS 159 Two Lecture Introduction. Parallel Processing: A Hardware Solution & A Software Challenge

Size: px

Start display at page:

Download "CS 159 Two Lecture Introduction. Parallel Processing: A Hardware Solution & A Software Challenge"

Sheena McBride
10 years ago
Views:

1 CS 159 Two Lecture Introduction Parallel Processing: A Hardware Solution & A Software Challenge

2 We re on the Road to Parallel Processing

3 Outline Hardware Solution (Day 1) Software Challenge (Day 2) Opportunities

4 Outline Hardware Solution Technical Software Challenge Technical Opportunities Technical

5 Outline Hardware Solution Technical Business Software Challenge Technical Business Opportunities Technical Business

6 Outline Hardware Solution Technical Business Software Challenge Opportunities

7 Computer Hardware Evolution Highway Micro Macro Car & Driver Hardware & Software

8 Micro-Scopic View

9 Micro-Scopic View Intel 8086 Intel Core 2 Duo 1978 Year ,000 Transistors 291,000,000 5 MHz Clock Frequency 2.9 GHz

10 Micro-Scopic View In 28 Years from 1978 to 2006: Number of Transistors Increased 10,034X Clock Frequency Increased 586X Primary Driver / Facilitator was (and is) Moore s Law: Number of Transistors Doubles every Months Stated by Gordon Moore, Intel Co-Founder in 1965 Prediction has been proven valid over a long term Prediction has been the Law for over 40 years

Number of Transistors Doubles every 18-24 Months Stated by Gordon Moore, Intel Co-Founder

11 Micro-Scopic View

12 Micro-Scopic View

13 Historically, Huge Performance Gains came from Huge Clock Frequency Increases Unfortunately.

14 Micro-Scopic View Clock Rate Limits Have Been Reached Source: Patterson, Computer Organization and Design

15 Micro-Scopic View Blocked by Heat and Power Wall

16 Micro-Scopic View Power (and Heat) Grows as Frequency 3 Power Voltage 2 x Frequency Voltage Frequency Power Frequency 3 How can HW Performance Continue to Increase? Single Core Multi-Core

17 Single Core vs. Dual Core Single Core clocked at 2f Dual Core clocked at f 2f Throughput f + f 8f 3 Heat f 3 + f 3 Sequential Processing Parallel Processing

18 Micro-Scopic View Instruction-Level Parallelism (ILP) was also Heavily Used Implemented On-Chip via Hardware Transparent to Software (No impact on Programmers) We will Study Two Types: Pipelining (Intra-Instruction Parallelism) Multi-Function Units (Inter-Instruction Parallelism) ILP has provided reasonable speedups in the past, Unfortunately.

will Study Two Types: Pipelining (Intra-Instruction Parallelism) Multi-Function Units

19 Micro-Scopic View Instruction-Level Parallelism Limits have been Reached too Watts, IPC 100 Power grows exponentially 10 1 Diminishing returns w.r.t. larger instruction window, higher issue-width Single-Issue Pipelined Superscalar (Limited Look-Ahead) Superscalar (Aggressive Look-Ahead) Aggressiveness Of ILP

larger instruction window, higher issue-width Single-Issue Pipelined

20 Gain - to - Effort Ratio of ILP beyond Knee of Curve Diminishing Returns due to Increased Cost and Complexity of Extracting ILP Performance Made sense to go Superscalar and Multi-Function: Good ROI Very little gain for substantial effort Effort Scalar In-Order Moderate Pipelining, Look-Ahead Very Deep Pipe, Aggressive Look-Ahead

Superscalar and Multi-Function: Good ROI Very little gain for substantial effort

21 Micro-Scopic View Summary Clock Frequency Scaling Limits have been Reached Instruction Level Parallelism Limits have been Reached Era of Single Core Performance Increases has Ended No More Free Lunch for Software Programmers Multiple Cores Will Directly Expose Parallelism to SW All Future Micro-Processor Designs will be Multi-Core Evident in Chip Manufacturer s RoadMaps

22 Micro-Scopic View Summary

23 Micro-Scopic View Summary Moore s Law Continues to 2x Transistors / 24 mos, but It will be used to Increase Number of Cores Instead 1 2 4

24 Single Core Sequential Processing Multi-Core Parallel Processing

25 Computer Hardware Evolution Highway Micro Macro

26 Macro-Scopic View Personal Computer Nodes: 1 Location: Desktop

27 Macro-Scopic View Cluster Computer Nodes: 10 s 100 s Location: Local

28 Macro-Scopic View Example Cluster Computer

29 Hardware Solution - Technical Macro-Scopic View Grid Computer Nodes: 1,000 s Location: Distributed

30 Macro-Scopic View Example Grid Computer

31 Macro-Scopic View Cloud Computer Nodes: 10,000 s Location: Highly Distributed

32 Macro-Scopic View App Engine Azure EC2 Example Cloud Computer

33 Macro-Scopic View Summary Single Node Sequential Processing Many Nodes Parallel Processing

34 Hardware Solution - Business Micro Macro Sequential Processing Parallel Processing

35 Hardware Solution - Business Computer Processing Power: Has a Highly Elastic Supply and Demand Curve Increased Supply Generates Increased Demand Hardware Computer Power New Applications Software Software is Like a Gas It Expands to Fill any size Hardware Container Sequential Processing Parallel Processing

36 Hardware Solution - Business 640K should be enough for anybody Bill Gates, 1981 There is a world market for maybe five computers Thomas Watson, 1943 Even Visionaries Sometime Forget No Matter How Much Computer Power People Have, They Always Want More Goal of Computer Hardware and Software Designers Continually Increase Performance and Lower Cost Operate at Optimum Point on Technology Curve Sequential Processing Parallel Processing

37 Hardware Solution - Business Technology Curve Cost Performance Cost / Performance vs. Performance Under-Utilization Optimum Utilization Over- Utilization Performance

38 Hardware Solution - Business Technology Curve Cost Performance Cost / Performance vs. Performance Under-Utilization Optimum Utilization Over- Utilization Performance

39 Hardware Solution - Business Technology Curve Cost Performance $$$ Cost / Performance vs. Performance $ $$$$$$ Sequential Processing X 2X Performance

40 Hardware Solution - Business Technology Curve Cost Performance $$$ Cost / Performance vs. Performance $ $$$$$$ Sequential Processing X 2X Performance

41 Hardware Solution - Business Technology Curve Cost Performance Cost / Performance vs. Performance $ $$$$$$ Sequential Processing X Performance $$ Parallel Processing

42 Key Points Hardware Solution Parallel Processing is really an Evolution in Micro- and Macro-Architecture Hardware That provides a Solution to: The Heat and Power Wall The Limitations of ILP Cost-Effective Higher Performance Parallel Processing is also a Software Challenge

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide