AMD Opteron Quad-Core



Similar documents
OpenSPARC T1 Processor

Hardware Based Virtualization Technologies. Elsie Wahlig Platform Software Architect

Family 10h AMD Phenom II Processor Product Data Sheet

Full and Para Virtualization

You re not alone if you re feeling pressure

Chapter 02: Computer Organization. Lesson 04: Functional units and components in a computer organization Part 3 Bus Structures

WHITE PAPER. AMD-V Nested Paging. AMD-V Nested Paging. Issue Date: July, 2008 Revision: 1.0. Advanced Micro Devices, Inc.

Communicating with devices

COMPUTER HARDWARE. Input- Output and Communication Memory Systems

Virtualization. Pradipta De

OC By Arsene Fansi T. POLIMI

Family 12h AMD Athlon II Processor Product Data Sheet

Architecture of Hitachi SR-8000

The Bus (PCI and PCI-Express)

Virtualizing a Virtual Machine

AMD 64 Virtualization

AMD PhenomII. Architecture for Multimedia System Prof. Cristina Silvano. Group Member: Nazanin Vahabi Kosar Tayebani

Multi-Threading Performance on Commodity Multi-Core Processors

DDR3 memory technology

Knut Omang Ifi/Oracle 19 Oct, 2015

enabling Ultra-High Bandwidth Scalable SSDs with HLnand

Introduction to RISC Processor. ni logic Pvt. Ltd., Pune

ECLIPSE Performance Benchmarks and Profiling. January 2009

A New Chapter for System Designs Using NAND Flash Memory

Low Power AMD Athlon 64 and AMD Opteron Processors

PCI Express Overview. And, by the way, they need to do it in less time.

I/O Virtualization Using Mellanox InfiniBand And Channel I/O Virtualization (CIOV) Technology

Virtual Switching Without a Hypervisor for a More Secure Cloud

DDR subsystem: Enhancing System Reliability and Yield

SPARC64 VIIIfx: CPU for the K computer

Desktop Processor Roadmap. Solution Provider Accounts

Basic Performance Measurements for AMD Athlon 64, AMD Opteron and AMD Phenom Processors

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Maximizing Server Storage Performance with PCI Express and Serial Attached SCSI. Article for InfoStor November 2003 Paul Griffith Adaptec, Inc.

Uses for Virtual Machines. Virtual Machines. There are several uses for virtual machines:

Intel DPDK Boosts Server Appliance Performance White Paper

1. Memory technology & Hierarchy

Hypervisors. Introduction. Introduction. Introduction. Introduction. Introduction. Credits:

COS 318: Operating Systems. Virtual Machine Monitors

OBJECTIVE ANALYSIS WHITE PAPER MATCH FLASH. TO THE PROCESSOR Why Multithreading Requires Parallelized Flash ATCHING

PCI Express: Interconnect of the future

Virtualization Technology. Zhiming Shen

How System Settings Impact PCIe SSD Performance

PCI Express* Ethernet Networking

Intel Itanium Quad-Core Architecture for the Enterprise. Lambert Schaelicke Eric DeLano

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

INSTRUCTION LEVEL PARALLELISM PART VII: REORDER BUFFER

Lizy Kurian John Electrical and Computer Engineering Department, The University of Texas as Austin

Intel PCI and PCI Express*

PCI Express Impact on Storage Architectures and Future Data Centers. Ron Emerick, Oracle Corporation

CHAPTER 7: The CPU and Memory

Figure 1A: Dell server and accessories Figure 1B: HP server and accessories Figure 1C: IBM server and accessories

Read this before starting!

Binary search tree with SIMD bandwidth optimization using SSE

6ES7313-5BF03-0AB0. Supply voltages Rated value 24 V DC Yes permissible range, upper limit (DC) circuit breaker type B, min. 4 A

Computer Systems Structure Input/Output

Performance of Software Switching

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

Open Flow Controller and Switch Datasheet

Hyper-V R2: What's New?

The Microsoft Windows Hypervisor High Level Architecture

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Distributed Systems. Virtualization. Paul Krzyzanowski

SCSI vs. Fibre Channel White Paper

Itanium 2 Platform and Technologies. Alexander Grudinski Business Solution Specialist Intel Corporation

FPGA-based Multithreading for In-Memory Hash Joins

CS5460: Operating Systems. Lecture: Virtualization 2. Anton Burtsev March, 2013

Chapter 13 Selected Storage Systems and Interface

SAN Conceptual and Design Basics

LS DYNA Performance Benchmarks and Profiling. January 2009

Eight Ways to Increase GPIB System Performance

Computer Architecture

SOC architecture and design

Chapter 13. PIC Family Microcontroller

Virtualization. ! Physical Hardware. ! Software. ! Isolation. ! Software Abstraction. ! Encapsulation. ! Virtualization Layer. !

Computer Architecture TDTS10

Computer Architecture

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Bindel, Spring 2010 Applications of Parallel Computers (CS 5220) Week 1: Wednesday, Jan 27

RPM Brotherhood: KVM VIRTUALIZATION TECHNOLOGY

Microprocessor & Assembly Language

Development of Type-2 Hypervisor for MIPS64 Based Systems

PCI Express Impact on Storage Architectures. Ron Emerick, Sun Microsystems

Semiconductor Device Technology for Implementing System Solutions: Memory Modules

What is a bus? A Bus is: Advantages of Buses. Disadvantage of Buses. Master versus Slave. The General Organization of a Bus

Virtual Machines. COMP 3361: Operating Systems I Winter

COS 318: Operating Systems. Virtual Machine Monitors

Chapter 6. Inside the System Unit. What You Will Learn... Computers Are Your Future. What You Will Learn... Describing Hardware Performance

Measuring Cache and Memory Latency and CPU to Memory Bandwidth

IOS110. Virtualization 5/27/2014 1

OPENSPARC T1 OVERVIEW

Enabling Technologies for Distributed and Cloud Computing

Performance Evaluation of Intel EPT Hardware Assist VMware ESX builds & (internal builds)

TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

All-Flash Arrays Weren t Built for Dynamic Environments. Here s Why... This whitepaper is based on content originally posted at

Technical Note DDR2 Offers New Features and Functionality

Transcription:

AMD Opteron Quad-Core a brief overview Daniele Magliozzi Politecnico di Milano

Opteron Memory Architecture native quad-core design (four cores on a single die for more efficient data sharing) enhanced cache structure integrated memory controller sustain multi-threaded application throughput fitting modern servers and workstations needs Daniele Magliozzi - Politecnico di Milano - 1 - AMD Opteron Quad-Core : A Brief Overview

3 levels of dedicated & shared cache 4 different caches accelerate instruction exec. and data processing L1 Instruction Cache: 64-Kbyte, 2-way set-associative, 64 bytes line length, LRU; for instruction loads, instruction prefetching, instruction predecoding, and branch prediction. L1 Data Cache: 64-Kbyte, 2-way set-associative, W.A. & W.B. with LRU, divided into eight banks(16 bytes wide), with prefetcher and 3- cycle load-to-use latency. L2 Cache: contains only victim or copy-back blocks from L1. L3 Cache: dynamically shared, non-inclusive victim cache with blocks allocated on L2 victim/copy-backs. Hits in L3 can either leave the data there (for data accessed by multiple cores), or remove the data from L3 placing it solely in L1(for data accessed by a single core) Daniele Magliozzi - Politecnico di Milano - 2 - AMD Opteron Quad-Core : A Brief Overview

DDR2 SDRAM with integrated memory controller SDRAM: store memory in memory cells activated using clock signal to synchronize their operation with an external data bus. DDR2 SDRAM: (double data rate synchronous dynamic random access memory) cells transfer data both on the rising and falling edge of the clock (a technique called "double pumping"). Improvement: operation of the external data bus at twice the clock rate achieved to obtain twice the bandwidth over its predecessor (DDR) Memory Controller: integrated on-die, manages the flow of data going to and from the main memory, optimizing memory performance and bandwidth per CPU and reducing latency inherent in front-side buffer architectures. Daniele Magliozzi - Politecnico di Milano - 3 - AMD Opteron Quad-Core : A Brief Overview

Direct Connect Architecture Front side bus eliminated, core directly connected to: memory controller I/O subsystem other processors by high bw. Hypertransport links. Improving overall system performance and efficiency by eliminating traditional bottlenecks inherent in legacy front side bus architectures. Daniele Magliozzi - Politecnico di Milano - 4 - AMD Opteron Quad-Core : A Brief Overview

HyperTransport Technology high-speed, low latency, point-to-point, unidirectional links between two devices, capable of extremely fast signaling (up to 800MHz ck. sp.) compatible with PCI interface. Packetized bus: addresses, data, and commands are sent along the same wires allowing narrower links easier to route. HT System: a processor with a HyperTransport port called HyperTransport host, the HyperTransport bus and any I/O channels connected to it. Differential signaling: (employed by links) use two wires for each signal, with the result being the difference between the two signals sent, does not suffer from problems associated with the singleended signaling of high speed parallel buses (bouncing signals, interference, cross-talk). Daniele Magliozzi - Politecnico di Milano - 5 - AMD Opteron Quad-Core : A Brief Overview

HyperTransport Technology (Switch Topology) supports multiple connection topologies: daisy chain, switch, star. Switch Topology The host communicates directly with the switch chip, which in turn manages multiple independent slaves including tunnels, bridges, and end-device chips (Parallelize Daisy Chain). Each port on the switch benefits from the full bandwidth of the HyperTransport technology I/O link because the switch directs the flow of electrical signals between the slave devices connected to it. Daniele Magliozzi - Politecnico di Milano - 6 - AMD Opteron Quad-Core : A Brief Overview

AMD Virtualization To allow multiple operating systems to run on the same physical platform, a SW platform layer ( Hypervisor) decouples the operating system from the underlying hardware. It is also a translation layer for guest virtual addresses that could operate in 2 ways: SW: Hypervisor modifies the guest source code to cooperate with him or to control his privileged operations(at run-time). HW-assisted virtualization: Hypervisor uses a set processor extensions (ex: AMD-V) to intercept and emulate guest privileged operations. In AMD-V technology Hypervisor specifies how the processor should handle privileged operations in guest itself without transferring control to the Hypervisor. This improves the efficiency of switching between VM, helping improve performance and effectively isolates VM for secure operation. Daniele Magliozzi - Politecnico di Milano - 7 - AMD Opteron Quad-Core : A Brief Overview

Rapid Virtualization Indexing (RVI) Paging enabled: the operating system defines a set of Page Tables, used by the Page Walker (implemented in processor HW), in order to translate the linear addresses to physical addresses. guest Page Table (gpt): another level of translation under virtualization. Hypervisor can manage it via SW (with the shadow Page Table) or via HW: nested Page Tables (npt): set by the Hypervisor in the Page Walker and letting it manage translations using a second level of translation, reducing overheads found in equivalent shadow paging implementations, storing recent translations in an internal translation look-aside buffer (TLB). Daniele Magliozzi - Politecnico di Milano - 8 - AMD Opteron Quad-Core : A Brief Overview

Power Performances Enhanced AMD PowerNow! with Independent Dynamic Core: Allows processors and cores to operate at various voltages and frequencies. AMD CoolCore Technology: Reduces processor energy consumption by turning off unused parts of the processor. AMD Smart Fetch Technology: Allows core to enter "halt" state and draw less power. Reduces CPU power consumption. Daniele Magliozzi - Politecnico di Milano - 9 - AMD Opteron Quad-Core : A Brief Overview

Opteron 4-C 3 rd generation optimizations 1.Load-Execute Instructions (for Floating-Point or Integer Operands) 2.Write-Combining (multiple memory-write cycles in a 64-B buffer) 3.Branches That Depend on Random Data(avoid random condition branch) 4.Loop Unrolling 5.Pointer Arithmetic in Loops(using loop count. as index into memory arrays) 6.Explicit Load Instructions 7.Reuse of Dead Registers 8.ccNUMA (cache coherent non-uniform memory access) 9.Prefetch and Streaming Instructions Daniele Magliozzi - Politecnico di Milano - 10 - AMD Opteron Quad-Core : A Brief Overview

Some Technical Data Core Speed System Bus Speed Integrated memory Speed Wattage L1 Cache Size L2 Cache Size L3 Cache Size 2800 MHz 2200 MHz 2200 MHz 75 W 64 Kbyte (X 4 Cache) 512 Kbyte (X 4 Cache) 6144 Kbyte (X 1 Cache) Daniele Magliozzi - Politecnico di Milano - 11 - AMD Opteron Quad-Core : A Brief Overview