Multi-Core Processors: New Way to Achieve High System Performance

Size: px
Start display at page:

Download "Multi-Core Processors: New Way to Achieve High System Performance"

Transcription

1 Multi-Core Processors: New Way to Achieve High System Performance Pawe Gepner EMEA Regional Architecture Specialist Intel Corporation Micha F. Kowalik Market Analyst Intel Corporation Abstract Multi-core processors represent an evolutionary change in conventional computing as well setting the new trend for high performance computing (HPC) - but parallelism is nothing new. Intel has a long history with the concept of parallelism and the development of hardware-enhanced threading capabilities. Intel has been delivering threadingcapable products for more than a decade. The move toward chip-level multiprocessing architectures with a large number of cores continues to offer dramatically increased performance and power characteristics. Nonetheless, this move also presents significant challenges. This paper will describe how far the industry has progressed and evaluates some of the challenges we are facing with multi-core processors and some of the solutions that have been developed. 1. Introduction Since the birth of microprocessors in 1971, the industry has successfully continued to innovate and increase performance. These performance gains can be accomplished in several ways including more sophisticated process technology, innovative architecture or micro-architecture. The architecture of a processor refers to the instruction set, registers, and data structures that are public to the programmer and are maintained and enhanced from one generation to the next. The micro-architecture of a processor refers to an implementation of processor s architecture in silicon, the micro-architecture typically changes from one processor generation to the next, while implementing the same public processor architecture. Process technology refers to a semiconductor circuit design process in silicon and the manufacturing methodologies used to create transistors which are increasingly smaller, faster and more power efficient. The output of this process is the production of a more sophisticated and integrated chip. Performance refers to the amount of time it takes to execute a given task. This is not simply clock frequency alone or the number of instructions executed per clock cycle, but rather the combination of both clock frequency and instructions per clock cycle. Saying that, performance can be computed as a result of a frequency and instructions per clock cycle: Performance = Instructions executed Per Clock (IPC) * Frequency These two factors of a performance need to be taken into consideration when we think about high performance processor. Unfortunately increasing frequency and instruction per clock ratio concurrently is not trivial. From an architectural point of view there is always the compromise between a microarchitecture optimized for high-frequency and microarchitecture more focused on IPC ratio. Looking for today s trend and tendencies in a processor design we observe more and more projects focused on a parallelism oriented design than looking for a possibility to increase clock s speed. Recently this has given scale to developments in two main directions: utilize instruction level parallelism (ILP) more aggressively; to make use of parallelism at a higher-thaninstruction level (i.e., thread). This approach is manifested by simultaneous multithreading (SMT) and chip multiprocessing (CMP). Some of the new designs are trying to exploit both approaches, utilizing instruction level parallelism and at the same time optimizing design for multi-core. Intel Corporation with its long history of exploring

2 instruction-level parallelism started with Intel Pentium processor in 1994 was also pioneering on the front of a higher thread-level parallelism on volume implementations with HT Technology. Currently with a multi-core approach Intel is setting a new standard for high performance processors. The move towards chip-level multiprocessing architectures with a large number of cores creates also a substantial challenges which are the need to make multi-core processors uncomplicated to program. 2. Performance and performance-perwatt consideration True performance is a combination of both clock frequency and IPC. This shows that the performance can be improved by increasing frequency and IPC. The frequency is a function of both the manufacturing process and the micro-architecture. On existing process technology 65 nm CMOS and microarchitecture optimized for that frequency (e.g. long pipelining design) such as NetBurst we can achieve today 3.8 GHz maximally. Unfortunately high clock ratio has some implications in power consumption. If we analyze the NetBurst s based processors running today we observe highest available speed 3.8 GHz and the thermal guideline 115 W. Dealing with such a thermal consideration is not an easy task. Assuming that a new process technology, which is in the preproduction phase right now (45 nm CMOS), will change the situation dramatically is wrong. Unfortunately leakage power limits frequency scaling (Figure 1) and it is the most important constraint of frequency acceleration. Figure 1. Leakage Power (% of total) vs. process technology If the frequency can not be easy accelerated and it is an issue to deal with the thermal consideration then we need to focus on increasing instructions per clock, whilst fitting within an acceptable thermal envelope.. IPC represents an absolute potential performance but we need to take into considerations that there are two important operational parameters as follows: IPC = Uops per Cycle / Uops per Instruction Uops per Cycle represents the average number of micro-operations executed per cycle. Uops per Cycle reflects to micro-architecture issue parallelism. Clearly if the number of execution units is superior then number of Uops issued and executed per one cycle is going to be a higher as well. If we want to have the highest IPC then we need to get Uops per Cycle ratio as high as possible. Uops Per Instruction represents common number of micro-operations which are going to be used for an execution of one full instruction. We point out that Uops Per Instruction represents temporal parallelism of instruction. The idea is to minimize number of Uops used to construct an instruction. In an ideal situation Uops Per Instruction = 1. In addition to the two methods of increasing performance described above, performance can be boosted by reducing the number of instructions required to complete the specific task. This special technique is called single instruction multiple data (SIMD). Intel first implemented a 64-bit integer SIMD in 1996 as part of the Intel Pentium processor with MMX technology. The following implementation of SIMD was 128-bit SIMD singleprecision floating-point instructions (SSE), in the Pentium III processor which was introduced in SSE2 and SSE3 extensions have been added to the Pentium 4 family in 2000 and 2004 respectively. Intel has more recently introduced new innovative techniques within there latest mobile microarchitecture, called microfusion. Microfusion fuses many common micro-operations into a single microop, to reduce the total number of micro-ops necessary to execute a given task. Pure performance is important but we need to always consider the implications on power, when measuring the performance of a system. More and more we look for the best ratio of performance per watt. If the power consumption is related to the dynamic capacitance, the square of the voltage with which the transistors and I/O buffers are supplied times the frequency at which the transistors and signals are switching then we can express:

3 Power = Dynamic Capacitance x Voltage 2 x Frequency Taking into account performance and power equations, CPU designers need to balance IPC efficiency from one side and voltage and frequency from the other to offer a compromise of performance and power efficiency of the processor. New metrics of design success are no longer focused just pure performance, but rather in delivering a new microarchitecture which delivers leadership in both raw performance and in performance per watt 3. Multi-core: The new innovative design In addition to all the methods described above and all of the considerations we have discussed so far, there is also one extra way to build a high performance system. Dual and multi-core processor systems are going to change the dynamics of the market and enable new innovative designs delivering high performance with an optimized power characteristic. They drive multithreading and parallelism at a higher than instruction level, and provide it to mainstream computing on a massive scale. From an operating system level (OS) they look like a symmetric multiprocessor system (SMP) but they bring lot more advantage than typical dual or multi processor systems that we know from the classic server architecture. Multi-core processing is a long-term strategy for Intel that began more than a decade ago. Intel has more than 15 multi-core processor projects underway and it is on the fast track to deliver multi-core processors in high volume across off of there platform families. Intel s multicore architecture will possibly feature dozens or even hundreds of processor cores on a single die. In addition to general-purpose cores, Intel multi-core processors will eventually include specialized cores for processing graphics, speech recognition algorithms, communication protocols, and more. Many new and significant innovations designed to optimize the power, performance, and scalability is implemented into the new multi-core processors. Consider a dual-core processor from a system perspective as being recognized as two separate CPUs. In such a configuration counting the number of the CPUs in the system becomes confusing from a software perspective so many vendors count number of sockets in the system instead of CPUs. Two socket systems is more accurate definition of the system capability than two CPU based systems. This new nomenclature of the systems naming is even more important in the near future. Two socket systems can be easily populated by dual-core or quad-cores CPUs and represent exactly the same system capability but different performance. Multi-core CPUs are also different from implementation to implementation, where sum of them represent monolithic design while other represent Multi-Chip Processor (MCP). The class of an implementation is mainly driven by manufacturing cost efficiencies. The monolithic implementation does not provide the same time to market efficiencies as a high volume MCP packaging and must therefore be a product of a bin out at the lowest common frequency denominator, and if one core is bad then they must scrap the whole die. The MCP enables better overall yield (good cores can be paired from anywhere on Wafer) and enable a greater bin as higher frequency dual core processors can be paired and frequency matched from anywhere on the wafer (they do not need to be contiguous). The monolithic design usually has shared L2 cache which increases the efficiency of cache to processor core data transfers, as well processor to processor communication. In the monolithic scenario the entire L2 cache can be allocated to one core when is needed. The difference between shared L2 cache dual-core CPUs and independent caches for each core populating dual socket system illustrate Figure 2. Figure 2. Dual socket system with different dual-core CPU L2 cache organization

4 The number of the transistors implemented in a dual core CPU is doubled vs. a single core processor, but even with this transistor count, the new designs enable the processor to operate within the same, or even a reduced power envelope. All of those considerations and challenges have been taken into account during the definition of new Intel Core micro-architecture. This new micro-architecture is a new foundation for desktop, mobile, and server multicore processors. This state-of-the-art microarchitecture extends the energy-efficient philosophy, first delivered in Intel s mobile micro-architecture found in the Intel Pentium M processor family and optimized for the performance, and scalability of multi-core processors. The most important micro-architecture innovations added to Intel Core micro-architecture are: Intel Wide Dynamic Execution Intel Intelligent Power Capability Intel Advanced Smart Cache Intel Smart Memory Access Intel Advanced Digital Media Boost 3.1. Intel Wide Dynamic Execution Intel Wide Dynamic Execution enables delivery of more instructions per clock cycle to improve execution time and energy efficiency. Every execution core is wider, allowing each core to fetch, dispatch, execute, and return up to four full instructions simultaneously comparing to 3 in the previous generation processors Intel Intelligent Power Capability Intel Intelligent Power Capability is a set of capabilities designed to reduce power consumption. This feature manages the runtime power consumption of all the processor s execution cores. It includes an advanced power-gating capability that allows for an ultra fine-grained logic control that turns on individual processor logic subsystems only if and when they are needed Intel Advanced Smart Cache Intel Advanced Smart Cache is multi-core optimized cache that improves performance and efficiency by increasing the probability that each execution core of a dual-core processor can access data from a higher-performance, more-efficient cache subsystem. To accomplish this, Intel shares L2 cache between cores. With Intel s shared L2 cache, the data only has to be stored in one place that each core can access. By sharing L2 caches among each core, Intel Advanced Smart Cache can use up to 100 % of available L2 cache when is needed. When one core has minimal cache requirements, other cores can increase their percentage of L2 cache, reducing cache misses and increasing performance. Multi-Core Optimized Cache also enables obtaining data from cache at higher throughput rates Intel Smart Memory Access Intel Smart Memory Access includes an important new capability called memory disambiguation, which increases the efficiency of out-of-order processing by providing the execution cores with an integrated intelligence to load data speculatively for instructions that are about to be executed before all previously stored instructions are executed. Memory disambiguation improves execution throughput by maximizing the available system-bus bandwidth and hiding latency to the memory subsystem Intel Advanced Digital Media Boost Intel Advanced Digital Media Boost is a feature which significantly improves performance when executing SSE instructions. In the previousgeneration processors, SSE, SSE2, and SSE3 instructions were executed of one complete instruction every two clock cycles. Intel Advanced Digital Media Boost enables 128-bit instructions to be executed during every clock cycle, effectively doubling the speed of execution for these instructions and raising the IPC ratio. 4. Software implication Most of today s existing applications will see immediate benefits when running on the processors that are based on a dual core CPU, but there are many things to consider when optimizing software for the multi-core systems. Software designer must consider the nature of the processing and the system configuration. When designing software to be run on a multi-core or multiprocessor system, a software designer s main consideration should be how to

5 allocate the work that will be done on all available processors. The most common way to allocate this work is to use a threading model where that work is broken into separate execution units called threads that can run on the different processors at the same time to achieve parallel execution. 5. Conclusion With the release of the first dual-core processor we enter a new era in processor architecture. Dual-core and multi-core processors become the standard for delivering greater performance, improved performance per watt, and new capabilities across desktop, mobile, and server platforms. Platforms built around the dual-core processors are ideal for enthusiasts who crave computing power for audio, video, digital design and gaming applications from one side and multitasking scenarios in business from latter one. Multi-core capabilities can enhance user experiences in multitasking environments, namely, where a number of foreground applications run concurrently with a number of background applications such as virus protection and security, wireless, management, compression, encryption and synchronization. Multi-core chips do more work per clock cycle, and can be designed to operate at the lower frequencies than their single-core counterparts. All of this makes significantly improved user experiences in both home and business environments and the same time extends Moore s Law well into the future. [5] A. Moshovos, G. S. Sohi, Microarchitectural innovations: Boosting microprocessor performance beyond semiconductor technology scaling, Proc. IEEE, vol. 89, pp , Nov [6] D. M. Tullsen, S. J. Eggers, H. M. Levy, Simultaneous multithreading: Maximizing on-chip parallelism, Proc. 22th Annu. Int. Symp. Computer Architecture, pp , [7] S. J. Eggers, J. S. Emer, H. M. Levy, J. L. Lo, R. L. Stamm, D. M. Tullsen, Simultaneous multithreading: A platform for next generation processors, IEEE Micro, vol. 17, pp , Sept./Oct References [1] R.M. Ramanathan, Intel Multi-Core Processors: Leading the Next Digital Revolution, Intel Magazine [2] O. Wechsler, Inside Intel Core Microarchitecture: Setting New Standards for Energy-Efficient Performance, Intel Magazine, [3] J. E. Smith, G. S. Sohi, The Microarchitecture of superscalar processors, Proc. IEEE, vol. 83, pp , Dec [4] R. Ronen, A. Mendelson, K. Lai, S.-L. Lu, F. Pollack, J. P. Shen, Coming challenges in microarchitecture and architecture, Proc. IEEE, vol. 89, pp , Mar

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007

Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007 Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Low Power AMD Athlon 64 and AMD Opteron Processors

Low Power AMD Athlon 64 and AMD Opteron Processors Low Power AMD Athlon 64 and AMD Opteron Processors Hot Chips 2004 Presenter: Marius Evers Block Diagram of AMD Athlon 64 and AMD Opteron Based on AMD s 8 th generation architecture AMD Athlon 64 and AMD

More information

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture? This Unit: Putting It All Together CIS 501 Computer Architecture Unit 11: Putting It All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Amir Roth with contributions by Milo

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

Generations of the computer. processors.

Generations of the computer. processors. . Piotr Gwizdała 1 Contents 1 st Generation 2 nd Generation 3 rd Generation 4 th Generation 5 th Generation 6 th Generation 7 th Generation 8 th Generation Dual Core generation Improves and actualizations

More information

Chapter 2 Parallel Computer Architecture

Chapter 2 Parallel Computer Architecture Chapter 2 Parallel Computer Architecture The possibility for a parallel execution of computations strongly depends on the architecture of the execution platform. This chapter gives an overview of the general

More information

Multi-Core Programming

Multi-Core Programming Multi-Core Programming Increasing Performance through Software Multi-threading Shameem Akhter Jason Roberts Intel PRESS Copyright 2006 Intel Corporation. All rights reserved. ISBN 0-9764832-4-6 No part

More information

Making the Move to Quad-Core and Beyond

Making the Move to Quad-Core and Beyond White Paper Intel Multi-Core Processors Intel Multi-Core Processors Making the Move to Quad-Core and Beyond R.M. Ramanathan Intel Corporation White Paper Intel Multi-Core Processors: Making the Move to

More information

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey A Survey on ARM Cortex A Processors Wei Wang Tanima Dey 1 Overview of ARM Processors Focusing on Cortex A9 & Cortex A15 ARM ships no processors but only IP cores For SoC integration Targeting markets:

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

SPARC64 VIIIfx: CPU for the K computer

SPARC64 VIIIfx: CPU for the K computer SPARC64 VIIIfx: CPU for the K computer Toshio Yoshida Mikio Hondo Ryuji Kan Go Sugizaki SPARC64 VIIIfx, which was developed as a processor for the K computer, uses Fujitsu Semiconductor Ltd. s 45-nm CMOS

More information

Multi-core and Linux* Kernel

Multi-core and Linux* Kernel Multi-core and Linux* Kernel Suresh Siddha Intel Open Source Technology Center Abstract Semiconductor technological advances in the recent years have led to the inclusion of multiple CPU execution cores

More information

An examination of the dual-core capability of the new HP xw4300 Workstation

An examination of the dual-core capability of the new HP xw4300 Workstation An examination of the dual-core capability of the new HP xw4300 Workstation By employing single- and dual-core Intel Pentium processor technology, users have a choice of processing power options in a compact,

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

More on Pipelining and Pipelines in Real Machines CS 333 Fall 2006 Main Ideas Data Hazards RAW WAR WAW More pipeline stall reduction techniques Branch prediction» static» dynamic bimodal branch prediction

More information

Design Cycle for Microprocessors

Design Cycle for Microprocessors Cycle for Microprocessors Raúl Martínez Intel Barcelona Research Center Cursos de Verano 2010 UCLM Intel Corporation, 2010 Agenda Introduction plan Architecture Microarchitecture Logic Silicon ramp Types

More information

Thread level parallelism

Thread level parallelism Thread level parallelism ILP is used in straight line code or loops Cache miss (off-chip cache and main memory) is unlikely to be hidden using ILP. Thread level parallelism is used instead. Thread: process

More information

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?

More information

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com CSCI-GA.3033-012 Graphics Processing Units (GPUs): Architecture and Programming Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Modern GPU

More information

Control 2004, University of Bath, UK, September 2004

Control 2004, University of Bath, UK, September 2004 Control, University of Bath, UK, September ID- IMPACT OF DEPENDENCY AND LOAD BALANCING IN MULTITHREADING REAL-TIME CONTROL ALGORITHMS M A Hossain and M O Tokhi Department of Computing, The University of

More information

Advanced Core Operating System (ACOS): Experience the Performance

Advanced Core Operating System (ACOS): Experience the Performance WHITE PAPER Advanced Core Operating System (ACOS): Experience the Performance Table of Contents Trends Affecting Application Networking...3 The Era of Multicore...3 Multicore System Design Challenges...3

More information

Itanium 2 Platform and Technologies. Alexander Grudinski Business Solution Specialist Intel Corporation

Itanium 2 Platform and Technologies. Alexander Grudinski Business Solution Specialist Intel Corporation Itanium 2 Platform and Technologies Alexander Grudinski Business Solution Specialist Intel Corporation Intel s s Itanium platform Top 500 lists: Intel leads with 84 Itanium 2-based systems Continued growth

More information

AMD PhenomII. Architecture for Multimedia System -2010. Prof. Cristina Silvano. Group Member: Nazanin Vahabi 750234 Kosar Tayebani 734923

AMD PhenomII. Architecture for Multimedia System -2010. Prof. Cristina Silvano. Group Member: Nazanin Vahabi 750234 Kosar Tayebani 734923 AMD PhenomII Architecture for Multimedia System -2010 Prof. Cristina Silvano Group Member: Nazanin Vahabi 750234 Kosar Tayebani 734923 Outline Introduction Features Key architectures References AMD Phenom

More information

Multicore Processor, Parallelism and Their Performance Analysis

Multicore Processor, Parallelism and Their Performance Analysis Multicore Processor, Parallelism and Their Performance Analysis I Rakhee Chhibber, II Dr. R.B.Garg I Research Scholar, MEWAR University, Chittorgarh II Former Professor, Delhi School of Professional Studies

More information

How To Build A Cloud Computer

How To Build A Cloud Computer Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology

More information

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit

Unit A451: Computer systems and programming. Section 2: Computing Hardware 1/5: Central Processing Unit Unit A451: Computer systems and programming Section 2: Computing Hardware 1/5: Central Processing Unit Section Objectives Candidates should be able to: (a) State the purpose of the CPU (b) Understand the

More information

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

More information

CS 159 Two Lecture Introduction. Parallel Processing: A Hardware Solution & A Software Challenge

CS 159 Two Lecture Introduction. Parallel Processing: A Hardware Solution & A Software Challenge CS 159 Two Lecture Introduction Parallel Processing: A Hardware Solution & A Software Challenge We re on the Road to Parallel Processing Outline Hardware Solution (Day 1) Software Challenge (Day 2) Opportunities

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

LOOKING FOR AN AMAZING PROCESSOR. Product Brief 6th Gen Intel Core Processors for Desktops: S-series

LOOKING FOR AN AMAZING PROCESSOR. Product Brief 6th Gen Intel Core Processors for Desktops: S-series Product Brief 6th Gen Intel Core Processors for Desktops: Sseries LOOKING FOR AN AMAZING PROCESSOR for your next desktop PC? Look no further than 6th Gen Intel Core processors. With amazing performance

More information

Energy-Efficient, High-Performance Heterogeneous Core Design

Energy-Efficient, High-Performance Heterogeneous Core Design Energy-Efficient, High-Performance Heterogeneous Core Design Raj Parihar Core Design Session, MICRO - 2012 Advanced Computer Architecture Lab, UofR, Rochester April 18, 2013 Raj Parihar Energy-Efficient,

More information

Intel Pentium 4 Processor on 90nm Technology

Intel Pentium 4 Processor on 90nm Technology Intel Pentium 4 Processor on 90nm Technology Ronak Singhal August 24, 2004 Hot Chips 16 1 1 Agenda Netburst Microarchitecture Review Microarchitecture Features Hyper-Threading Technology SSE3 Intel Extended

More information

INTEL HIGH-PERFORMANCE CONSUMER DESKTOP MICROPROCESSOR TIMELINE

INTEL HIGH-PERFORMANCE CONSUMER DESKTOP MICROPROCESSOR TIMELINE INTEL HIGH-PERFORMANCE CONSUMER DESKTOP MICROPROCESSOR TIMELINE 1971: 4004 Microprocessor The 4004 was Intel's first microprocessor. This breakthrough invention powered the Busicom* calculator and paved

More information

2

2 1 2 3 4 5 For Description of these Features see http://download.intel.com/products/processor/corei7/prod_brief.pdf The following Features Greatly affect Performance Monitoring The New Performance Monitoring

More information

Intel Xeon Processor E5-2600

Intel Xeon Processor E5-2600 Intel Xeon Processor E5-2600 Best combination of performance, power efficiency, and cost. Platform Microarchitecture Processor Socket Chipset Intel Xeon E5 Series Processors and the Intel C600 Chipset

More information

Infrastructure Matters: POWER8 vs. Xeon x86

Infrastructure Matters: POWER8 vs. Xeon x86 Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report

More information

OC By Arsene Fansi T. POLIMI 2008 1

OC By Arsene Fansi T. POLIMI 2008 1 IBM POWER 6 MICROPROCESSOR OC By Arsene Fansi T. POLIMI 2008 1 WHAT S IBM POWER 6 MICROPOCESSOR The IBM POWER6 microprocessor powers the new IBM i-series* and p-series* systems. It s based on IBM POWER5

More information

PC Solutions That Mean Business

PC Solutions That Mean Business PC Solutions That Mean Business Desktop and notebook PCs for small business Powered by the Intel Core 2 Duo Processor The Next Big Thing in Business PCs The Features and Performance to Drive Business Success

More information

Historically, Huge Performance Gains came from Huge Clock Frequency Increases Unfortunately.

Historically, Huge Performance Gains came from Huge Clock Frequency Increases Unfortunately. Historically, Huge Performance Gains came from Huge Clock Frequency Increases Unfortunately. Hardware Solution Evolution of Computer Architectures Micro-Scopic View Clock Rate Limits Have Been Reached

More information

on an system with an infinite number of processors. Calculate the speedup of

on an system with an infinite number of processors. Calculate the speedup of 1. Amdahl s law Three enhancements with the following speedups are proposed for a new architecture: Speedup1 = 30 Speedup2 = 20 Speedup3 = 10 Only one enhancement is usable at a time. a) If enhancements

More information

OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING

OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING OBJECTIVE ANALYSIS WHITE PAPER MATCH ATCHING FLASH TO THE PROCESSOR Why Multithreading Requires Parallelized Flash T he computing community is at an important juncture: flash memory is now generally accepted

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

EVALUATING NEW ARCHITECTURAL FEATURES OF THE INTEL(R) XEON(R) 7500 PROCESSOR FOR HPC WORKLOADS

EVALUATING NEW ARCHITECTURAL FEATURES OF THE INTEL(R) XEON(R) 7500 PROCESSOR FOR HPC WORKLOADS Computer Science Vol. 12 2011 Paweł Gepner, David L. Fraser, Michał F. Kowalik, Kazimierz Waćkowski EVALUATING NEW ARCHITECTURAL FEATURES OF THE INTEL(R) XEON(R) 7500 PROCESSOR FOR HPC WORKLOADS In this

More information

Introduction to Multi-Core

Introduction to Multi-Core Introduction to Multi-Core Baskaran Ganesan Baskaran.ganesan@intel.com Sr. Design Engineer Digital Enterprise Group, Intel Corporation Foundation for Advancement of Education and Research (FAER) 1 Topics

More information

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower The traditional approach better performance Why computers are

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

Desktop Processor Roadmap. Solution Provider Accounts

Desktop Processor Roadmap. Solution Provider Accounts Desktop Processor Roadmap Solution Provider Accounts August 2008 Desktop Division Roadmap Changes since July 2008 Additions Energy-efficient Brisbane 5050e processor to launch in Q408 Desktop Processors

More information

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the

More information

This Unit: Multithreading (MT) CIS 501 Computer Architecture. Performance And Utilization. Readings

This Unit: Multithreading (MT) CIS 501 Computer Architecture. Performance And Utilization. Readings This Unit: Multithreading (MT) CIS 501 Computer Architecture Unit 10: Hardware Multithreading Application OS Compiler Firmware CU I/O Memory Digital Circuits Gates & Transistors Why multithreading (MT)?

More information

CISC, RISC, and DSP Microprocessors

CISC, RISC, and DSP Microprocessors CISC, RISC, and DSP Microprocessors Douglas L. Jones ECE 497 Spring 2000 4/6/00 CISC, RISC, and DSP D.L. Jones 1 Outline Microprocessors circa 1984 RISC vs. CISC Microprocessors circa 1999 Perspective:

More information

Coming Challenges in Microarchitecture and Architecture

Coming Challenges in Microarchitecture and Architecture Coming Challenges in Microarchitecture and Architecture RONNY RONEN, SENIOR MEMBER, IEEE, AVI MENDELSON, MEMBER, IEEE, KONRAD LAI, SHIH-LIEN LU, MEMBER, IEEE, FRED POLLACK, AND JOHN P. SHEN, FELLOW, IEEE

More information

Overview. CPU Manufacturers. Current Intel and AMD Offerings

Overview. CPU Manufacturers. Current Intel and AMD Offerings Central Processor Units (CPUs) Overview... 1 CPU Manufacturers... 1 Current Intel and AMD Offerings... 1 Evolution of Intel Processors... 3 S-Spec Code... 5 Basic Components of a CPU... 6 The CPU Die and

More information

Which ARM Cortex Core Is Right for Your Application: A, R or M?

Which ARM Cortex Core Is Right for Your Application: A, R or M? Which ARM Cortex Core Is Right for Your Application: A, R or M? Introduction The ARM Cortex series of cores encompasses a very wide range of scalable performance options offering designers a great deal

More information

Introduction to Microprocessors

Introduction to Microprocessors Introduction to Microprocessors Yuri Baida yuri.baida@gmail.com yuriy.v.baida@intel.com October 2, 2010 Moscow Institute of Physics and Technology Agenda Background and History What is a microprocessor?

More information

A Powerful solution for next generation Pcs

A Powerful solution for next generation Pcs Product Brief 6th Generation Intel Core Desktop Processors i7-6700k and i5-6600k 6th Generation Intel Core Desktop Processors i7-6700k and i5-6600k A Powerful solution for next generation Pcs Looking for

More information

Managing Data Center Power and Cooling

Managing Data Center Power and Cooling White PAPER Managing Data Center Power and Cooling Introduction: Crisis in Power and Cooling As server microprocessors become more powerful in accordance with Moore s Law, they also consume more power

More information

Multithreading Lin Gao cs9244 report, 2006

Multithreading Lin Gao cs9244 report, 2006 Multithreading Lin Gao cs9244 report, 2006 2 Contents 1 Introduction 5 2 Multithreading Technology 7 2.1 Fine-grained multithreading (FGMT)............. 8 2.2 Coarse-grained multithreading (CGMT)............

More information

Intel Data Direct I/O Technology (Intel DDIO): A Primer >

Intel Data Direct I/O Technology (Intel DDIO): A Primer > Intel Data Direct I/O Technology (Intel DDIO): A Primer > Technical Brief February 2012 Revision 1.0 Legal Statements INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

More information

High Performance Computing in the Multi-core Area

High Performance Computing in the Multi-core Area High Performance Computing in the Multi-core Area Arndt Bode Technische Universität München Technology Trends for Petascale Computing Architectures: Multicore Accelerators Special Purpose Reconfigurable

More information

Embedded Parallel Computing

Embedded Parallel Computing Embedded Parallel Computing Lecture 5 - The anatomy of a modern multiprocessor, the multicore processors Tomas Nordström Course webpage:: Course responsible and examiner: Tomas

More information

Technical Report. Complexity-effective superscalar embedded processors using instruction-level distributed processing. Ian Caulfield.

Technical Report. Complexity-effective superscalar embedded processors using instruction-level distributed processing. Ian Caulfield. Technical Report UCAM-CL-TR-707 ISSN 1476-2986 Number 707 Computer Laboratory Complexity-effective superscalar embedded processors using instruction-level distributed processing Ian Caulfield December

More information

Multi-core processors An overview

Multi-core processors An overview Multi-core processors An overview Balaji Venu 1 1 Department of Electrical Engineering and Electronics, University of Liverpool, Liverpool, UK Abstract Microprocessors have revolutionized the world we

More information

STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORS

STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORS STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORS Nitin Chaturvedi 1 S Gurunarayanan 2 1 Department of Electrical Electronics Engineering, BITS, Pilani, India nitin80@bits-pilani.ac.in

More information

DESIGN CHALLENGES OF TECHNOLOGY SCALING

DESIGN CHALLENGES OF TECHNOLOGY SCALING DESIGN CHALLENGES OF TECHNOLOGY SCALING IS PROCESS TECHNOLOGY MEETING THE GOALS PREDICTED BY SCALING THEORY? AN ANALYSIS OF MICROPROCESSOR PERFORMANCE, TRANSISTOR DENSITY, AND POWER TRENDS THROUGH SUCCESSIVE

More information

Multicore Programming with LabVIEW Technical Resource Guide

Multicore Programming with LabVIEW Technical Resource Guide Multicore Programming with LabVIEW Technical Resource Guide 2 INTRODUCTORY TOPICS UNDERSTANDING PARALLEL HARDWARE: MULTIPROCESSORS, HYPERTHREADING, DUAL- CORE, MULTICORE AND FPGAS... 5 DIFFERENCES BETWEEN

More information

CSE 6040 Computing for Data Analytics: Methods and Tools

CSE 6040 Computing for Data Analytics: Methods and Tools CSE 6040 Computing for Data Analytics: Methods and Tools Lecture 12 Computer Architecture Overview and Why it Matters DA KUANG, POLO CHAU GEORGIA TECH FALL 2014 Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS

More information

Course Development of Programming for General-Purpose Multicore Processors

Course Development of Programming for General-Purpose Multicore Processors Course Development of Programming for General-Purpose Multicore Processors Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University Richmond, VA 23284 wzhang4@vcu.edu

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

RUNAHEAD EXECUTION: AN EFFECTIVE ALTERNATIVE TO LARGE INSTRUCTION WINDOWS

RUNAHEAD EXECUTION: AN EFFECTIVE ALTERNATIVE TO LARGE INSTRUCTION WINDOWS RUNAHEAD EXECUTION: AN EFFECTIVE ALTERNATIVE TO LARGE INSTRUCTION WINDOWS AN INSTRUCTION WINDOW THAT CAN TOLERATE LATENCIES TO DRAM MEMORY IS PROHIBITIVELY COMPLEX AND POWER HUNGRY. TO AVOID HAVING TO

More information

Configuring Memory on the HP Business Desktop dx5150

Configuring Memory on the HP Business Desktop dx5150 Configuring Memory on the HP Business Desktop dx5150 Abstract... 2 Glossary of Terms... 2 Introduction... 2 Main Memory Configuration... 3 Single-channel vs. Dual-channel... 3 Memory Type and Speed...

More information

Intel Labs at ISSCC 2012. Copyright Intel Corporation 2012

Intel Labs at ISSCC 2012. Copyright Intel Corporation 2012 Intel Labs at ISSCC 2012 Copyright Intel Corporation 2012 Intel Labs ISSCC 2012 Highlights 1. Efficient Computing Research: Making the most of every milliwatt to make computing greener and more scalable

More information

You re not alone if you re feeling pressure

You re not alone if you re feeling pressure How the Right Infrastructure Delivers Real SQL Database Virtualization Benefits The amount of digital data stored worldwide stood at 487 billion gigabytes as of May 2009, and data volumes are doubling

More information

Symmetric Multiprocessing

Symmetric Multiprocessing Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called

More information

Intel Itanium Quad-Core Architecture for the Enterprise. Lambert Schaelicke Eric DeLano

Intel Itanium Quad-Core Architecture for the Enterprise. Lambert Schaelicke Eric DeLano Intel Itanium Quad-Core Architecture for the Enterprise Lambert Schaelicke Eric DeLano Agenda Introduction Intel Itanium Roadmap Intel Itanium Processor 9300 Series Overview Key Features Pipeline Overview

More information

Chip Multithreading: Opportunities and Challenges

Chip Multithreading: Opportunities and Challenges Chip Multithreading: Opportunities and Challenges Lawrence Spracklen & Santosh G. Abraham Scalable Systems Group Sun Microsystems Inc., Sunnyvale, CA {lawrence.spracklen,santosh.abraham}@sun.com Abstract

More information

Technology Update White Paper. High Speed RAID 6. Powered by Custom ASIC Parity Chips

Technology Update White Paper. High Speed RAID 6. Powered by Custom ASIC Parity Chips Technology Update White Paper High Speed RAID 6 Powered by Custom ASIC Parity Chips High Speed RAID 6 Powered by Custom ASIC Parity Chips Why High Speed RAID 6? Winchester Systems has developed High Speed

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

Emerging IT and Energy Star PC Specification Version 4.0: Opportunities and Risks. ITI/EPA Energy Star Workshop June 21, 2005 Donna Sadowy, AMD

Emerging IT and Energy Star PC Specification Version 4.0: Opportunities and Risks. ITI/EPA Energy Star Workshop June 21, 2005 Donna Sadowy, AMD Emerging IT and Energy Star PC Specification Version 4.0: Opportunities and Risks ITI/EPA Energy Star Workshop June 21, 2005 Donna Sadowy, AMD Defining the Goal The ITI members and EPA share a common goal:

More information

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit

More information

Quad-Core Intel Xeon Processor

Quad-Core Intel Xeon Processor Product Brief Quad-Core Intel Xeon Processor 5300 Series Quad-Core Intel Xeon Processor 5300 Series Maximize Energy Efficiency and Performance Density in Two-Processor, Standard High-Volume Servers and

More information

HyperThreading Support in VMware ESX Server 2.1

HyperThreading Support in VMware ESX Server 2.1 HyperThreading Support in VMware ESX Server 2.1 Summary VMware ESX Server 2.1 now fully supports Intel s new Hyper-Threading Technology (HT). This paper explains the changes that an administrator can expect

More information

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage sponsored by Dan Sullivan Chapter 1: Advantages of Hybrid Storage... 1 Overview of Flash Deployment in Hybrid Storage Systems...

More information

1. Memory technology & Hierarchy

1. Memory technology & Hierarchy 1. Memory technology & Hierarchy RAM types Advances in Computer Architecture Andy D. Pimentel Memory wall Memory wall = divergence between CPU and RAM speed We can increase bandwidth by introducing concurrency

More information

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX Overview CISC Developments Over Twenty Years Classic CISC design: Digital VAX VAXÕs RISC successor: PRISM/Alpha IntelÕs ubiquitous 80x86 architecture Ð 8086 through the Pentium Pro (P6) RJS 2/3/97 Philosophy

More information

GPUs for Scientific Computing

GPUs for Scientific Computing GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research

More information

IA-64 Application Developer s Architecture Guide

IA-64 Application Developer s Architecture Guide IA-64 Application Developer s Architecture Guide The IA-64 architecture was designed to overcome the performance limitations of today s architectures and provide maximum headroom for the future. To achieve

More information

Intel Xeon Processor 5500 Series. An Intelligent Approach to IT Challenges

Intel Xeon Processor 5500 Series. An Intelligent Approach to IT Challenges Intel Xeon Processor 5500 Series An Intelligent Approach to IT Challenges A Giant Leap for IT and Business Capabilities In many organizations, IT infrastructure has begun to constrain business efficiency

More information

Computer Architecture TDTS10

Computer Architecture TDTS10 why parallelism? Performance gain from increasing clock frequency is no longer an option. Outline Computer Architecture TDTS10 Superscalar Processors Very Long Instruction Word Processors Parallel computers

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

NVIDIA Tegra 4 Family CPU Architecture

NVIDIA Tegra 4 Family CPU Architecture Whitepaper NVIDIA Tegra 4 Family CPU Architecture 4-PLUS-1 Quad core 1 Table of Contents... 1 Introduction... 3 NVIDIA Tegra 4 Family of Mobile Processors... 3 Benchmarking CPU Performance... 4 Tegra 4

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

AMD and SAP. Linux Alliance Overview. Key Customer Benefits

AMD and SAP. Linux Alliance Overview. Key Customer Benefits AMD and SAP The AMD/SAP relationship is driven by a mutual desire to provide customers straightforward technology solutions for complex business problems. Through our joint efforts, SAP customers can benefit

More information

Choosing a Computer for Running SLX, P3D, and P5

Choosing a Computer for Running SLX, P3D, and P5 Choosing a Computer for Running SLX, P3D, and P5 This paper is based on my experience purchasing a new laptop in January, 2010. I ll lead you through my selection criteria and point you to some on-line

More information

Intel Itanium Architecture

Intel Itanium Architecture Intel Itanium Architecture Roadmap and Technology Update Dr. Gernot Hoyler Technical Marketing EMEA Intel Itanium Architecture Growth MARKET Over 3x revenue growth Y/Y* More than 10x growth* in shipments

More information

Performance evaluation

Performance evaluation Performance evaluation Arquitecturas Avanzadas de Computadores - 2547021 Departamento de Ingeniería Electrónica y de Telecomunicaciones Facultad de Ingeniería 2015-1 Bibliography and evaluation Bibliography

More information

Parallelism and Cloud Computing

Parallelism and Cloud Computing Parallelism and Cloud Computing Kai Shen Parallel Computing Parallel computing: Process sub tasks simultaneously so that work can be completed faster. For instances: divide the work of matrix multiplication

More information

IT@Intel. Comparing Multi-Core Processors for Server Virtualization

IT@Intel. Comparing Multi-Core Processors for Server Virtualization White Paper Intel Information Technology Computer Manufacturing Server Virtualization Comparing Multi-Core Processors for Server Virtualization Intel IT tested servers based on select Intel multi-core

More information