SPARC64 X: Fujitsu s New Generation 16 Core Processor for the next generation UNIX servers

Size: px
Start display at page:

Download "SPARC64 X: Fujitsu s New Generation 16 Core Processor for the next generation UNIX servers"

Transcription

1 X: Fujitsu s New Generation 16 Processor for the next generation UNIX servers August 29, 2012 Takumi Maruyama Processor Development Division Enterprise Server Business Unit Fujitsu Limited All Rights Reserved,Copyright FUJITSU LIMITED 2012

2 X 2 Agenda Fujitsu Processor Development History TM X Design concept SWoC (Software on Chip) Processor chip overview u-architecture Performance Summary

3 High Performance Technology High Reliability Technology Fujitsu Processor Development Virtual Machine Architecture Software On Chip HPC-ACE System On Chip Hardware Barrier Multi-core Multi-thread L2$ on Die Non-Blocking $ O-O-O Execution Super-Scalar Single-chip CPU Store Ahead Branch History Prefetch $ ECC Register/ALU Parity Instruction Retry $ Dynamic Degradation RC/RT/History Tr=10M CMOS Al 350nm GS8600 II Tr=30M CMOS Al 250nm / 220nm GS8800B GS8800 GP ~1999 Processor Tr=190M0 130nm Tr=46M 180nm Tr=30M 180nm / 150nm 2000~2003 Tr=190M 130nm X 3 GS8900 GP Tr=400M 90nm V GS Tr=540M 90nm V + Tr=760M 45nm Tr=600M 65nm VI GS VII Tr=500M 90nm Tr=1B 40nm VIIIfx IXfx GS21 Mainframe :Technology generation 2004~ ~2011 Tr=2.95B 28nm X 2012~

4 X 4 X Design Concept Combine UNIX and HPC FJ processor features to realize an extremely high throughput UNIX processor. VII/VII+ (UNIX processor) feature High CPU frequency (up-to 3GHz) Multicore/Multithread Scalability : up-to 64sockets VIIIfx (HPC processor) feature HPC-ACE: Innovative ISA extensions to SPARC-V9 High Memory B/W: peak 64GB/s, Embedded Memory Controller Add new features vital to current and future UNIX servers Virtual Machine Architecture Software On Chip Embedded IOC (PCI-GEN3 controller) Direct CPU-CPU interconnect

5 X 5 Software on Chip 1/2 HW for SW Accelerates specific software function with HW The targets Decimal operation (IEEE754 decimal and NUMBER) Cypher operation (AES/DES) Database acceleration HW implementation The HW engines for SWoC are implemented in FPU To fully utilize 128 FP registers & software pipelining Implemented as instructions rather than dedicated co-processor to maximize flexibility of SW. Avoid complication due to CISC type instructions Various RISC type instructions are newly defined, instead. 18 insts. for Decimal, and 10 insts. for Cypher operation

6 Software on Chip 2/2 Decimal Instructions Supported data type IEEE754 DPD(Densely Packed Decimal) 8B fixed length NUMBER Variable length (max 21Byte) Instructions Both DPD/NUMBER instructions are defined as 8B operation (add/sub/mul/div/cmp) on FP registers To maximize performance with reasonable HW cost When the data length is > 8byte, multiple such instructions will be used. An instruction for special byte-shift on FP registers is newly added to support unaligned NUMBER X 6 0 Fd[rs1] and 0 0 Fd[rd] 0 Fd[rs2] and 0 0

7 X Chip Overview DDR3 Interface MAC Architecture Features 16 cores x 2 threads SWoC (Software on Chip) Shared 24 MB L2$ Embedded Memory and IO Controller SERDES PCI GEN3 L2 Cache Data L2 Cache Control MAC L2 Cache Data SERDES Inter-CPU 28nm CMOS 23.5mm x 25.0mm 2,950M transistors 1,500 signal pins 3GHz DDR3 Interface Performance (peak) 288GIPS/382GFlops 102GB/s memory throughput X 7

8 X 8 X spec L1I$ L1$ control Instruction Control L1D$ Execution Unit Register File Instruction Set Architecture Branch Prediction Integer Execution Units FP Execution Units SPARC-V9/JPS HPC-ACE VM SWoC 4K BRHIS 16K PHT 156 GPR x GUB ALU/SHIFT x2 ALU/AGEN x2 MULT/DIVIDE x1 128 FPR x FUB FMA x4, FDIV x2 IMA/Logic x4 Decimal x1 / Cypher x2 L1$ L1I$ 64KB/4way L1D$ 64KB/4way

9 CPU u-architecture enhancements from TM VII+ Deeper pipeline to increase Frequency Better Branch Prediction Scheme Various Queue-size and #Floating point register increase Richer execution Units, including 2EX + 2EAG 2EX + 2EX/EAG 2FMA 4FMA to support 2way-SIMD SWoC engine (Decimal and Cypher) More aggressive O-O-O execution of load and store Multi-banked 2port L1-Cache System On Chip #core and L2$ size (4core/12MB 16core/24MB) Memory Controller, IO Controller, and CPU-CPU I/F are all embedded to increase performance and reduce cost. X 9

10 Fetch (4 stages) TM VII/VII+ Pipeline Issue Dispatch Reg.-Read (2 stages) (4 stages) Execute Memory (L1$: 3 stages) Commit (2 stages) L1 I$ 64KB 2Way Branch Target Address 8Kentry Decode & Issue RSA 10Entry RSE 8x2Entry RSF 8x2Entry GPR 156Registers x2 GUB 32Registers FPR 64Registers x2 EAGA EAGB EXA EXB FLA FLB Fetch Port 16Entry Store Port 16Entry Store Buffer 16Entry L1 D$ 64KB 2Way CSE 64Entry PC x2 Control Registers x2 RSBR 10Entry FUB 48Registers L2$ 6MB/12MB 12Way 4-core System Bus Interface X 10

11 Fetch (4 stages) TM X Pipeline Issue Dispatch Reg.-Read (4 stages) (5 stages) Execute Memory (L1$: 3 stages) Commit (2 stages) L1 I$ 64KB 4Way Branch Target Address 4Kentry Pattern History Table 16Kentry Decode & Issue RSA 24Entry RSE 24Entry RSF 20Entry RSBR 16Entry GPR GPR 156Registers 156Registers GUB 64Registers FPR FPR 128Registers 128Registers FUB 64Registers EAGA EXC EAGB EXD EXA EXB FLA Decimal Cypher FLB FLC Cypher FLD Fetch Port 32Entry Store Port 24Entry Write Buffer 10Entry L1 D$ 64KB 4Way CSE 96Entry PC PC Control Control Registers Registers L2$ 24MB 24Way 16-core Router CPU-CPU I/F Memory Controller IO Controller PCI-GEN3 DIMM X 11

12 Execution units enhancements (Ex.) Integer Execution Unit 2EX + 2EAG 2EX + 2EX/EAG 2 4W GPR 4 integer instructions can be executed per cycle (sustained) EXA EXB EXC EXD Update GUB Commit GPR Load Store Unit Aggressive load/store O-O-O execution: Execute load without waiting for preceding store address calculation. Multi-banked 2port L1-cache to execute 2 load or 1 load+1 store in parallel Doubled L1$ bus size Doubled L1$ associativity (2 4way) Increase L1-cache throughput and hit-rate L1$ 16B store L1 cache 2R/1W. (banked) 2R/1W 16B load x2 X 12

13 TM X interconnects TM VII/VII+ interconnects (SPARC Enterprise M8000) CPU CPU CPU SC SC SC MAC MAC MAC DDR2 DIMMs DDR2 DIMMs DDR2 DIMMs TM VII/VII+ interconnects 4 CPU require 8 additional LSIs to be connected with DIMM DIMM i/f: 4.35GB/s (STREAMtriad) CPU SC MAC DDR2 DIMMs TM X interconnects CPU CPU CPU 14.5GB/s CPU 102GB/s DDR3 DIMMs DDR3 DIMMs DDR3 DIMMs DDR3 DIMMs TM X interconnects No additional LSIs to be connected with DIMM DIMM i/f: 65.6GB/s (STREAMtriad) CPU i/f: 14.5GB/s x 5ports (peak) 3 ports: glueless 4way CPU interconnect 2 ports: > 4way CPU X 13

14 High Speed Transceivers (SerDes) CPU-CPU glue-less communication links 14.5Gb/s x 8 lanes bi-directional serial interface, 5 ports Embedded equalizer circuit enables long distance signal transmission Embedded adaptive control logic optimizes equalizer parameters automatically depending on the various system configurations PCI Express ports 8Gb/s x 8 lanes (Gen 3), 2 ports RX RX PLL TX Tx 14.5Gb/s x 8lanes SerDes Built-in SerDes provides peak 88.5GB/s x2 (up/down) total throughput X 14

15 Reliability, Availability, Serviceability Units Cache (Tag) Cache (Data) Register ALU Cache dynamic degradation HW Instruction Retry History Error detection and correction scheme ECC Duplicate & Parity ECC Parity ECC (INT/FP) Parity(Others) Parity/Residue Yes Yes Yes TM X RAS diagram Green: 1bit error Correctable Yellow: 1bit error Detectable Gray: 1bit error harmless New RAS features from VII/VII+ Floating-point registers are ECC protected #checkers increased to ~53,000 to identify a failure point more precisely Guarantees Data Integrity X 15

16 Hardware Instruction Retry Instruction Retry Fetch Execute Commit Fetch Execute Commit 1. Error 2. Flush 3. Single step execution 4. Update of SW visible resources IBF IWR CSE X PC IBF IWR CSE PC ALU EAGA/B EXA/B FLA/B RSE,RSF RSA RSBR GUB,FUB GPR,FPR Memory SW visible resources ALU EAGA/B EXA/B FLA/B RSE,RSF RSA RSBR GUB,FUB GPR,FPR Memory SW visible resources 5. Back to normal execution after the re-executed Instruction gets committed without an error. When an error is detected, Hardware re-execute the instruction automatically to remove the transient error by itself. X 16

17 Relative to TM X 17 TM X Hardware measured results 98x SWoC 15x 7x 8x TM X realizes 7x INT/FP/JVM throughput and 15x memory throughput of TM VII+ The INT/FP/JVM result is with un-tuned Compiler/JVM. SWoC of TM X results in max 98x throughput. The NUMBER score is for scalar. Expect to be much better for vector data.

18 TM X CPI (Cycle Per Instruction) Example Lower Performance TM VII+ v.s. TM X INT (single thread) Hardware measured results Shorter memory latency Large L2$ Improved Branch prediction 2 4way L1$ Increased throughput of L1$ 2EX+2EAG 2EX+2EX/EAG 2 4W GPR Higher Performance 4 integer execution units and write port increase of GPR (integer register) improves overall performance. Memory latency reduction, Large L2$, branch prediction, and L1$ improvement also contribute to the high performance dramatically. X 18

19 X 19 Summary TM X is Fujitsu s10 th SPARC processor which has been designed to be used for Fujitsu s next generation UNIX server. TM X integrates 16cores + 24MB L2 cache with over 100GB/s(peak) memory B/W. TM X keeps strong RAS features. TM X chip is up and running in the lab. It has shown 7 times throughput of TM VII+ w/o compiler tuning. SWoC is effective to accelerate specific software functions Fujitsu will continue to develop TM series.

20 TM X Abbreviations IB: Instruction Buffer RSA: Reservation Station for Address generation RSE: Reservation Station for Execution RSF: Reservation Station for Floating-point RSBR: Reservation Station for Branch GUB: General Update Buffer FUB: Floating point Update Buffer GPR: General Purpose Register FPR: Floating Point Register CSE: Commit Stack Entry X 20

SPARC64 VII Fujitsu s Next Generation Quad-Core Processor

SPARC64 VII Fujitsu s Next Generation Quad-Core Processor SPARC64 VII Fujitsu s Next Generation Quad-Core Processor August 26, 2008 Takumi Maruyama LSI Development Division Next Generation Technical Computing Unit Fujitsu Limited High Performance Technology High

More information

SPARC64 VIIIfx: CPU for the K computer

SPARC64 VIIIfx: CPU for the K computer SPARC64 VIIIfx: CPU for the K computer Toshio Yoshida Mikio Hondo Ryuji Kan Go Sugizaki SPARC64 VIIIfx, which was developed as a processor for the K computer, uses Fujitsu Semiconductor Ltd. s 45-nm CMOS

More information

Intel Itanium Quad-Core Architecture for the Enterprise. Lambert Schaelicke Eric DeLano

Intel Itanium Quad-Core Architecture for the Enterprise. Lambert Schaelicke Eric DeLano Intel Itanium Quad-Core Architecture for the Enterprise Lambert Schaelicke Eric DeLano Agenda Introduction Intel Itanium Roadmap Intel Itanium Processor 9300 Series Overview Key Features Pipeline Overview

More information

<Insert Picture Here> T4: A Highly Threaded Server-on-a-Chip with Native Support for Heterogeneous Computing

<Insert Picture Here> T4: A Highly Threaded Server-on-a-Chip with Native Support for Heterogeneous Computing T4: A Highly Threaded Server-on-a-Chip with Native Support for Heterogeneous Computing Robert Golla Senior Hardware Architect Paul Jordan Senior Principal Hardware Engineer Oracle

More information

OpenSPARC T1 Processor

OpenSPARC T1 Processor OpenSPARC T1 Processor The OpenSPARC T1 processor is the first chip multiprocessor that fully implements the Sun Throughput Computing Initiative. Each of the eight SPARC processor cores has full hardware

More information

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Copyright 2013, Oracle and/or its affiliates. All rights reserved. 1 Oracle SPARC Server for Enterprise Computing Dr. Heiner Bauch Senior Account Architect 19. April 2013 2 The following is intended to outline our general product direction. It is intended for information

More information

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA

BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA BEAGLEBONE BLACK ARCHITECTURE MADELEINE DAIGNEAU MICHELLE ADVENA AGENDA INTRO TO BEAGLEBONE BLACK HARDWARE & SPECS CORTEX-A8 ARMV7 PROCESSOR PROS & CONS VS RASPBERRY PI WHEN TO USE BEAGLEBONE BLACK Single

More information

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture?

This Unit: Putting It All Together. CIS 501 Computer Architecture. Sources. What is Computer Architecture? This Unit: Putting It All Together CIS 501 Computer Architecture Unit 11: Putting It All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Amir Roth with contributions by Milo

More information

OC By Arsene Fansi T. POLIMI 2008 1

OC By Arsene Fansi T. POLIMI 2008 1 IBM POWER 6 MICROPROCESSOR OC By Arsene Fansi T. POLIMI 2008 1 WHAT S IBM POWER 6 MICROPOCESSOR The IBM POWER6 microprocessor powers the new IBM i-series* and p-series* systems. It s based on IBM POWER5

More information

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,

More information

Putting it all together: Intel Nehalem. http://www.realworldtech.com/page.cfm?articleid=rwt040208182719

Putting it all together: Intel Nehalem. http://www.realworldtech.com/page.cfm?articleid=rwt040208182719 Putting it all together: Intel Nehalem http://www.realworldtech.com/page.cfm?articleid=rwt040208182719 Intel Nehalem Review entire term by looking at most recent microprocessor from Intel Nehalem is code

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX Overview CISC Developments Over Twenty Years Classic CISC design: Digital VAX VAXÕs RISC successor: PRISM/Alpha IntelÕs ubiquitous 80x86 architecture Ð 8086 through the Pentium Pro (P6) RJS 2/3/97 Philosophy

More information

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit. Objectives The Central Processing Unit: What Goes on Inside the Computer Chapter 4 Identify the components of the central processing unit and how they work together and interact with memory Describe how

More information

Microarchitecture and Performance Analysis of a SPARC-V9 Microprocessor for Enterprise Server Systems

Microarchitecture and Performance Analysis of a SPARC-V9 Microprocessor for Enterprise Server Systems Microarchitecture and Performance Analysis of a SPARC-V9 Microprocessor for Enterprise Server Systems Mariko Sakamoto, Akira Katsuno, Aiichiro Inoue, Takeo Asakawa, Haruhiko Ueno, Kuniki Morita, and Yasunori

More information

Architecture of Hitachi SR-8000

Architecture of Hitachi SR-8000 Architecture of Hitachi SR-8000 University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Slide 1 Most of the slides from Hitachi Slide 2 the problem modern computer are data

More information

Current Status of FEFS for the K computer

Current Status of FEFS for the K computer Current Status of FEFS for the K computer Shinji Sumimoto Fujitsu Limited Apr.24 2012 LUG2012@Austin Outline RIKEN and Fujitsu are jointly developing the K computer * Development continues with system

More information

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com CSCI-GA.3033-012 Graphics Processing Units (GPUs): Architecture and Programming Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Modern GPU

More information

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches: Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):

More information

A Scalable VISC Processor Platform for Modern Client and Cloud Workloads

A Scalable VISC Processor Platform for Modern Client and Cloud Workloads A Scalable VISC Processor Platform for Modern Client and Cloud Workloads Mohammad Abdallah Founder, President and CTO Soft Machines Linley Processor Conference October 7, 2015 Agenda Soft Machines Background

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek

Instruction Set Architecture. or How to talk to computers if you aren t in Star Trek Instruction Set Architecture or How to talk to computers if you aren t in Star Trek The Instruction Set Architecture Application Compiler Instr. Set Proc. Operating System I/O system Instruction Set Architecture

More information

More on Pipelining and Pipelines in Real Machines CS 333 Fall 2006 Main Ideas Data Hazards RAW WAR WAW More pipeline stall reduction techniques Branch prediction» static» dynamic bimodal branch prediction

More information

WHITE PAPER FUJITSU PRIMERGY SERVERS PERFORMANCE REPORT PRIMERGY BX620 S6

WHITE PAPER FUJITSU PRIMERGY SERVERS PERFORMANCE REPORT PRIMERGY BX620 S6 WHITE PAPER PERFORMANCE REPORT PRIMERGY BX620 S6 WHITE PAPER FUJITSU PRIMERGY SERVERS PERFORMANCE REPORT PRIMERGY BX620 S6 This document contains a summary of the benchmarks executed for the PRIMERGY BX620

More information

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

More information

Itanium 2 Platform and Technologies. Alexander Grudinski Business Solution Specialist Intel Corporation

Itanium 2 Platform and Technologies. Alexander Grudinski Business Solution Specialist Intel Corporation Itanium 2 Platform and Technologies Alexander Grudinski Business Solution Specialist Intel Corporation Intel s s Itanium platform Top 500 lists: Intel leads with 84 Itanium 2-based systems Continued growth

More information

VLIW Processors. VLIW Processors

VLIW Processors. VLIW Processors 1 VLIW Processors VLIW ( very long instruction word ) processors instructions are scheduled by the compiler a fixed number of operations are formatted as one big instruction (called a bundle) usually LIW

More information

Intel Xeon Processor E5-2600

Intel Xeon Processor E5-2600 Intel Xeon Processor E5-2600 Best combination of performance, power efficiency, and cost. Platform Microarchitecture Processor Socket Chipset Intel Xeon E5 Series Processors and the Intel C600 Chipset

More information

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2 Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of

More information

Intel Pentium 4 Processor on 90nm Technology

Intel Pentium 4 Processor on 90nm Technology Intel Pentium 4 Processor on 90nm Technology Ronak Singhal August 24, 2004 Hot Chips 16 1 1 Agenda Netburst Microarchitecture Review Microarchitecture Features Hyper-Threading Technology SSE3 Intel Extended

More information

FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015

FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015 FLOATING-POINT ARITHMETIC IN AMD PROCESSORS MICHAEL SCHULTE AMD RESEARCH JUNE 2015 AGENDA The Kaveri Accelerated Processing Unit (APU) The Graphics Core Next Architecture and its Floating-Point Arithmetic

More information

Pentium vs. Power PC Computer Architecture and PCI Bus Interface

Pentium vs. Power PC Computer Architecture and PCI Bus Interface Pentium vs. Power PC Computer Architecture and PCI Bus Interface CSE 3322 1 Pentium vs. Power PC Computer Architecture and PCI Bus Interface Nowadays, there are two major types of microprocessors in the

More information

The K computer: Project overview

The K computer: Project overview The Next-Generation Supercomputer The K computer: Project overview SHOJI, Fumiyoshi Next-Generation Supercomputer R&D Center, RIKEN The K computer Outline Project Overview System Configuration of the K

More information

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?

More information

Central Processing Unit (CPU)

Central Processing Unit (CPU) Central Processing Unit (CPU) CPU is the heart and brain It interprets and executes machine level instructions Controls data transfer from/to Main Memory (MM) and CPU Detects any errors In the following

More information

PCI Express IO Virtualization Overview

PCI Express IO Virtualization Overview Ron Emerick, Oracle Corporation Author: Ron Emerick, Oracle Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and

More information

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune Introduction to RISC Processor ni logic Pvt. Ltd., Pune AGENDA What is RISC & its History What is meant by RISC Architecture of MIPS-R4000 Processor Difference Between RISC and CISC Pros and Cons of RISC

More information

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey

A Survey on ARM Cortex A Processors. Wei Wang Tanima Dey A Survey on ARM Cortex A Processors Wei Wang Tanima Dey 1 Overview of ARM Processors Focusing on Cortex A9 & Cortex A15 ARM ships no processors but only IP cores For SoC integration Targeting markets:

More information

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit

More information

OPENSPARC T1 OVERVIEW

OPENSPARC T1 OVERVIEW Chapter Four OPENSPARC T1 OVERVIEW Denis Sheahan Distinguished Engineer Niagara Architecture Group Sun Microsystems Creative Commons 3.0United United States License Creative CommonsAttribution-Share Attribution-Share

More information

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER Course on: Advanced Computer Architectures INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER Prof. Cristina Silvano Politecnico di Milano cristina.silvano@polimi.it Prof. Silvano, Politecnico di Milano

More information

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications

A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications 1 A New, High-Performance, Low-Power, Floating-Point Embedded Processor for Scientific Computing and DSP Applications Simon McIntosh-Smith Director of Architecture 2 Multi-Threaded Array Processing Architecture

More information

Oracle Database Reliability, Performance and scalability on Intel Xeon platforms Mitch Shults, Intel Corporation October 2011

Oracle Database Reliability, Performance and scalability on Intel Xeon platforms Mitch Shults, Intel Corporation October 2011 Oracle Database Reliability, Performance and scalability on Intel platforms Mitch Shults, Intel Corporation October 2011 1 Intel Processor E7-8800/4800/2800 Product Families Up to 10 s and 20 Threads 30MB

More information

AMD Opteron Quad-Core

AMD Opteron Quad-Core AMD Opteron Quad-Core a brief overview Daniele Magliozzi Politecnico di Milano Opteron Memory Architecture native quad-core design (four cores on a single die for more efficient data sharing) enhanced

More information

Intel Itanium Architecture

Intel Itanium Architecture Intel Itanium Architecture Roadmap and Technology Update Dr. Gernot Hoyler Technical Marketing EMEA Intel Itanium Architecture Growth MARKET Over 3x revenue growth Y/Y* More than 10x growth* in shipments

More information

www.opensparc.net Creative Commons Attribution-Share 3.0 United States License

www.opensparc.net Creative Commons Attribution-Share 3.0 United States License OpenSPARC Slide-Cast In 12 Chapters Presented by OpenSPARC designers, developers, and programmers to guide users as they develop their own OpenSPARC designs and to assist professors as they teach the nextavailable

More information

SOC architecture and design

SOC architecture and design SOC architecture and design system-on-chip (SOC) processors: become components in a system SOC covers many topics processor: pipelined, superscalar, VLIW, array, vector storage: cache, embedded and external

More information

Introduction to Microprocessors

Introduction to Microprocessors Introduction to Microprocessors Yuri Baida yuri.baida@gmail.com yuriy.v.baida@intel.com October 2, 2010 Moscow Institute of Physics and Technology Agenda Background and History What is a microprocessor?

More information

The Foundation for Better Business Intelligence

The Foundation for Better Business Intelligence Product Brief Intel Xeon Processor E7-8800/4800/2800 v2 Product Families Data Center The Foundation for Big data is changing the way organizations make business decisions. To transform petabytes of data

More information

picojava TM : A Hardware Implementation of the Java Virtual Machine

picojava TM : A Hardware Implementation of the Java Virtual Machine picojava TM : A Hardware Implementation of the Java Virtual Machine Marc Tremblay and Michael O Connor Sun Microelectronics Slide 1 The Java picojava Synergy Java s origins lie in improving the consumer

More information

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Sockets vs. RDMA Interface over 1-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji Hemal V. Shah D. K. Panda Network Based Computing Lab Computer Science and Engineering

More information

Chapter 6. Inside the System Unit. What You Will Learn... Computers Are Your Future. What You Will Learn... Describing Hardware Performance

Chapter 6. Inside the System Unit. What You Will Learn... Computers Are Your Future. What You Will Learn... Describing Hardware Performance What You Will Learn... Computers Are Your Future Chapter 6 Understand how computers represent data Understand the measurements used to describe data transfer rates and data storage capacity List the components

More information

Introducción. Diseño de sistemas digitales.1

Introducción. Diseño de sistemas digitales.1 Introducción Adapted from: Mary Jane Irwin ( www.cse.psu.edu/~mji ) www.cse.psu.edu/~cg431 [Original from Computer Organization and Design, Patterson & Hennessy, 2005, UCB] Diseño de sistemas digitales.1

More information

Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor. Travis Lanier Senior Product Manager

Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor. Travis Lanier Senior Product Manager Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor Travis Lanier Senior Product Manager 1 Cortex-A15: Next Generation Leadership Cortex-A class multi-processor

More information

Instruction Set Architecture (ISA)

Instruction Set Architecture (ISA) Instruction Set Architecture (ISA) * Instruction set architecture of a machine fills the semantic gap between the user and the machine. * ISA serves as the starting point for the design of a new machine

More information

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 IBM CELL. Politecnico di Milano Como Campus

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 IBM CELL. Politecnico di Milano Como Campus Project made by: Origgi Alessandro matr. 682197 Teruzzi Roberto matr. 682552 CELL INTRODUCTION 2 1 CELL SYNERGY Cell is not a collection of different processors, but a synergistic whole Operation paradigms,

More information

Communicating with devices

Communicating with devices Introduction to I/O Where does the data for our CPU and memory come from or go to? Computers communicate with the outside world via I/O devices. Input devices supply computers with data to operate on.

More information

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011 Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis

More information

Optimizing Code for Accelerators: The Long Road to High Performance

Optimizing Code for Accelerators: The Long Road to High Performance Optimizing Code for Accelerators: The Long Road to High Performance Hans Vandierendonck Mons GPU Day November 9 th, 2010 The Age of Accelerators 2 Accelerators in Real Life 3 Latency (ps/inst) Why Accelerators?

More information

Generations of the computer. processors.

Generations of the computer. processors. . Piotr Gwizdała 1 Contents 1 st Generation 2 nd Generation 3 rd Generation 4 th Generation 5 th Generation 6 th Generation 7 th Generation 8 th Generation Dual Core generation Improves and actualizations

More information

"JAGUAR AMD s Next Generation Low Power x86 Core. Jeff Rupley, AMD Fellow Chief Architect / Jaguar Core August 28, 2012

JAGUAR AMD s Next Generation Low Power x86 Core. Jeff Rupley, AMD Fellow Chief Architect / Jaguar Core August 28, 2012 "JAGUAR AMD s Next Generation Low Power x86 Core Jeff Rupley, AMD Fellow Chief Architect / Jaguar Core August 28, 2012 TWO X86 CORES TUNED FOR TARGET MARKETS Mainstream Client and Server Markets Bulldozer

More information

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides

More information

POWER8 Performance Analysis

POWER8 Performance Analysis POWER8 Performance Analysis Satish Kumar Sadasivam Senior Performance Engineer, Master Inventor IBM Systems and Technology Labs satsadas@in.ibm.com #OpenPOWERSummit Join the conversation at #OpenPOWERSummit

More information

Introduction History Design Blue Gene/Q Job Scheduler Filesystem Power usage Performance Summary Sequoia is a petascale Blue Gene/Q supercomputer Being constructed by IBM for the National Nuclear Security

More information

Exascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation

Exascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation Exascale Challenges and General Purpose Processors Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation Jun-93 Aug-94 Oct-95 Dec-96 Feb-98 Apr-99 Jun-00 Aug-01 Oct-02 Dec-03

More information

İSTANBUL AYDIN UNIVERSITY

İSTANBUL AYDIN UNIVERSITY İSTANBUL AYDIN UNIVERSITY FACULTY OF ENGİNEERİNG SOFTWARE ENGINEERING THE PROJECT OF THE INSTRUCTION SET COMPUTER ORGANIZATION GÖZDE ARAS B1205.090015 Instructor: Prof. Dr. HASAN HÜSEYİN BALIK DECEMBER

More information

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation PCI Express Impact on Storage Architectures and Future Data Centers Ron Emerick, Oracle Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies

More information

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?

More information

A Quantum Leap in Enterprise Computing

A Quantum Leap in Enterprise Computing A Quantum Leap in Enterprise Computing Unprecedented Reliability and Scalability in a Multi-Processor Server Product Brief Intel Xeon Processor 7500 Series Whether you ve got data-demanding applications,

More information

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

More information

Discovering Computers 2011. Living in a Digital World

Discovering Computers 2011. Living in a Digital World Discovering Computers 2011 Living in a Digital World Objectives Overview Differentiate among various styles of system units on desktop computers, notebook computers, and mobile devices Identify chips,

More information

Family 10h AMD Phenom II Processor Product Data Sheet

Family 10h AMD Phenom II Processor Product Data Sheet Family 10h AMD Phenom II Processor Product Data Sheet Publication # 46878 Revision: 3.05 Issue Date: April 2010 Advanced Micro Devices 2009, 2010 Advanced Micro Devices, Inc. All rights reserved. The contents

More information

Computer Organization

Computer Organization Computer Organization and Architecture Designing for Performance Ninth Edition William Stallings International Edition contributions by R. Mohan National Institute of Technology, Tiruchirappalli PEARSON

More information

Performance Impacts of Non-blocking Caches in Out-of-order Processors

Performance Impacts of Non-blocking Caches in Out-of-order Processors Performance Impacts of Non-blocking Caches in Out-of-order Processors Sheng Li; Ke Chen; Jay B. Brockman; Norman P. Jouppi HP Laboratories HPL-2011-65 Keyword(s): Non-blocking cache; MSHR; Out-of-order

More information

Performance evaluation

Performance evaluation Performance evaluation Arquitecturas Avanzadas de Computadores - 2547021 Departamento de Ingeniería Electrónica y de Telecomunicaciones Facultad de Ingeniería 2015-1 Bibliography and evaluation Bibliography

More information

Week 1 out-of-class notes, discussions and sample problems

Week 1 out-of-class notes, discussions and sample problems Week 1 out-of-class notes, discussions and sample problems Although we will primarily concentrate on RISC processors as found in some desktop/laptop computers, here we take a look at the varying types

More information

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000. ILP Execution EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May 2000 Lecture #11: Wednesday, 3 May 2000 Lecturer: Ben Serebrin Scribe: Dean Liu ILP Execution

More information

ELE 356 Computer Engineering II. Section 1 Foundations Class 6 Architecture

ELE 356 Computer Engineering II. Section 1 Foundations Class 6 Architecture ELE 356 Computer Engineering II Section 1 Foundations Class 6 Architecture History ENIAC Video 2 tj History Mechanical Devices Abacus 3 tj History Mechanical Devices The Antikythera Mechanism Oldest known

More information

ECLIPSE Performance Benchmarks and Profiling. January 2009

ECLIPSE Performance Benchmarks and Profiling. January 2009 ECLIPSE Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox, Schlumberger HPC Advisory Council Cluster

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

White paper accelerate and consolidate the data center

White paper accelerate and consolidate the data center White paper accelerate and consolidate the data center with the innovation and dynamic scalability of the Fujitsu M10 server family Page 1 of 12 Table of Contents Table of Contents 1 Introduction 3 1.1

More information

Instruction Set Architecture (ISA) Design. Classification Categories

Instruction Set Architecture (ISA) Design. Classification Categories Instruction Set Architecture (ISA) Design Overview» Classify Instruction set architectures» Look at how applications use ISAs» Examine a modern RISC ISA (DLX)» Measurement of ISA usage in real computers

More information

LS DYNA Performance Benchmarks and Profiling. January 2009

LS DYNA Performance Benchmarks and Profiling. January 2009 LS DYNA Performance Benchmarks and Profiling January 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center The

More information

Storage Architectures. Ron Emerick, Oracle Corporation

Storage Architectures. Ron Emerick, Oracle Corporation PCI Express PRESENTATION and Its TITLE Interfaces GOES HERE to Flash Storage Architectures Ron Emerick, Oracle Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the

More information

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level

Computer System: User s View. Computer System Components: High Level View. Input. Output. Computer. Computer System: Motherboard Level System: User s View System Components: High Level View Input Output 1 System: Motherboard Level 2 Components: Interconnection I/O MEMORY 3 4 Organization Registers ALU CU 5 6 1 Input/Output I/O MEMORY

More information

Going Linux on Massive Multicore

Going Linux on Massive Multicore Embedded Linux Conference Europe 2013 Going Linux on Massive Multicore Marta Rybczyńska 24th October, 2013 Agenda Architecture Linux Port Core Peripherals Debugging Summary and Future Plans 2 Agenda Architecture

More information

Performance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab

Performance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab Performance monitoring at CERN openlab July 20 th 2012 Andrzej Nowak, CERN openlab Data flow Reconstruction Selection and reconstruction Online triggering and filtering in detectors Raw Data (100%) Event

More information

Chapter 4 System Unit Components. Discovering Computers 2012. Your Interactive Guide to the Digital World

Chapter 4 System Unit Components. Discovering Computers 2012. Your Interactive Guide to the Digital World Chapter 4 System Unit Components Discovering Computers 2012 Your Interactive Guide to the Digital World Objectives Overview Differentiate among various styles of system units on desktop computers, notebook

More information

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule

All Programmable Logic. Hans-Joachim Gelke Institute of Embedded Systems. Zürcher Fachhochschule All Programmable Logic Hans-Joachim Gelke Institute of Embedded Systems Institute of Embedded Systems 31 Assistants 10 Professors 7 Technical Employees 2 Secretaries www.ines.zhaw.ch Research: Education:

More information

Infrastructure Matters: POWER8 vs. Xeon x86

Infrastructure Matters: POWER8 vs. Xeon x86 Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report

More information

HANIC 100G: Hardware accelerator for 100 Gbps network traffic monitoring

HANIC 100G: Hardware accelerator for 100 Gbps network traffic monitoring CESNET Technical Report 2/2014 HANIC 100G: Hardware accelerator for 100 Gbps network traffic monitoring VIKTOR PUš, LUKÁš KEKELY, MARTIN ŠPINLER, VÁCLAV HUMMEL, JAN PALIČKA Received 3. 10. 2014 Abstract

More information

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu. Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.tw Review Computers in mid 50 s Hardware was expensive

More information

Radeon HD 2900 and Geometry Generation. Michael Doggett

Radeon HD 2900 and Geometry Generation. Michael Doggett Radeon HD 2900 and Geometry Generation Michael Doggett September 11, 2007 Overview Introduction to 3D Graphics Radeon 2900 Starting Point Requirements Top level Pipeline Blocks from top to bottom Command

More information

85MIV2 / 85MIV2-L -- Components Locations

85MIV2 / 85MIV2-L -- Components Locations Chapter Specification 85MIV2 / 85MIV2-L -- Components Locations RJ45 LAN Connector for 85MIV2-L only PS/2 Peripheral Mouse (on top) Power PS/2 K/B(underside) RJ45 (on top) +2V Power USB0 (middle) USB(underside)

More information

Competitive Comparison Dual-Core Intel Xeon Processor-based Platforms vs. AMD Opteron*

Competitive Comparison Dual-Core Intel Xeon Processor-based Platforms vs. AMD Opteron* Competitive Guide Dual-Core Intel Xeon Processor-based Systems Business Enterprise Competitive Comparison Dual-Core Intel Xeon Processor-based Platforms vs. AMD Opteron* Energy Efficient Performance Get

More information

18-447 Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013

18-447 Computer Architecture Lecture 3: ISA Tradeoffs. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013 18-447 Computer Architecture Lecture 3: ISA Tradeoffs Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/18/2013 Reminder: Homeworks for Next Two Weeks Homework 0 Due next Wednesday (Jan 23), right

More information

Mainframe. Large Computing Systems. Supercomputer Systems. Mainframe

Mainframe. Large Computing Systems. Supercomputer Systems. Mainframe 1 Large Computing Systems Server Farm Networked cluster of interchangeable file/application servers Provides load balancing for availability and reliability Blade Server Server farm in a single cabinet

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

Leading Virtualization Performance and Energy Efficiency in a Multi-processor Server

Leading Virtualization Performance and Energy Efficiency in a Multi-processor Server Leading Virtualization Performance and Energy Efficiency in a Multi-processor Server Product Brief Intel Xeon processor 7400 series Fewer servers. More performance. With the architecture that s specifically

More information