The Impact of Memory Subsystem Resource Sharing on Datacenter Applications. Lingia Tang Jason Mars Neil Vachharajani Robert Hundt Mary Lou Soffa

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "The Impact of Memory Subsystem Resource Sharing on Datacenter Applications. Lingia Tang Jason Mars Neil Vachharajani Robert Hundt Mary Lou Soffa"

Transcription

1 The Impact of Memory Subsystem Resource Sharing on Datacenter Applications Lingia Tang Jason Mars Neil Vachharajani Robert Hundt Mary Lou Soffa

2 Introduction Problem Recent studies into the effects of memory resource sharing among threads concentrated on common multithread workload benchmarks (SPEC, etc). These studies do not represent the workloads that have emerged in datacenter applications. Real-world datacenter applications

3 Introduction Demonstrates the impact of memory sharing in datacenter applications Importance of a good Thread-to-Core (TTC) mapping Evaluate the impact of co-locating threads Optimal TTC mappings changes with co-located applications Identify characteristics that impact performance in various TTC mappings Amount of sharing between threads Required memory bandwidth Application s cache footprint

4 Background and Motivation Memory Resource Sharing Sharing Configurations can vary within the same set of cores

5 Background and Motivation Datacenter Job Scheduling Global job scheduler selects appropriate machine Machine s OS then selects TTC mappings Memory resource sharing is not considered in scheduling decisions Job Priority and Co-location Latency-sensitive applications take a higher priority High priority jobs are scheduled with lower-priority jobs for efficiency Managing QoS priorities on multicores remains a challenge

6 Intra-application Sharing Experiment Methodology Platform: Intel Clovertown (Xeon E5345) dual-socket Linux kernel with a custom GCC Latency-sensitive applications run alone or with a batch application A load generator tests peak behavior with real-world query traces Experiments show performance variability among different TTC mappings Three experimental runs of a fixed load executing across four cores Measured application s specified performance metric

7 Intra-application Sharing Measurement and Findings Performance impact of resource sharing is significant Although each has a different sharing preference bigtable actually performs better when sharing resources!!

8 Intra-application Sharing Performance Variability Last Level Cache Misses Normalized to {X.X.X.X.} Fixed code section Results are consistent with performance trend More LLC misses leads to performance degradation

9 Intra-application Sharing Performance Variability FSB Bandwidth Consumption Normalized to {X.X.X.X.} Fixed code section Results consistent with performance trend More LLC misses increases the number of bus requests

10 Intra-application Sharing Performance Variability Data Sharing Five States of L2 Requests Prefetch, Modified, Exclusive, Shared, and Invalid Purple shows amount of sharing between threads Observations consistent with performance trend

11 Intra-application Sharing Performance Variability Summary LLC sharing has significant positive or negative performance effects Can add up to 10% variability Bus contention also has significant effects Can add another 10% of variability Applications with a high level of data sharing will benefit significantly from sharing the LLC and FSB The results demonstrate the importance of a good TTC mapping that mimics the applications inherent data sharing pattern

12 Inter-application Sharing Experiment Design Same setup as the previous experiment, except this experiment considers the impact of co-located applications with varying TTC mappings Performance of contentanalyzer, bigtable, and websearch are measured while sharing resources with co-running batch applications Co-running applications are stitcher and protobuf Each is a batch, low-priority application

13 Inter-application Sharing Measurement and Findings Normalized to solo-performance under the corresponding TTC mapping The TTC mappings have different effects when a co-runner (*) is present {X*X*X*X*} shares LLC, FSB, and memory subsystem {XX**XX**} shares the FSB and memory subsystem {XXXX****} shares only the memory subsystem (unavoidable)

14 Inter-application Sharing Measurement and Findings Normalized to solo-performance under the TTC mapping {X.X.X. X.} The TTC mappings have different effects when a co-runner (*) is present {X*X*X*X*} shares LLC, FSB, and memory subsystem {XX**XX**} shares the FSB and memory subsystem {XXXX****} shares only the memory subsystem (unavoidable)

15 Inter-application Sharing Measurement and Findings Summary The impact of sharing the LLC and FSB are effected by the specific co-running application Performance swings between best and worst TTC mapping can be very significant The optimal TTC mapping for an application may change with the specific co-running application chosen An intelligent TTC mapping system will account for underlying resources and sharing configurations

16 Varying Thread Count and Architecture Varying the Number of Threads Latency-sensitive applications run with 2 threads Co-located batch applications run with 6 threads Normalized to applications running solo with TTC mapping {X X } Results are consistent with 4-thread behavior

17 Varying Thread Count and Architecture Varying the Number of Threads Latency-sensitive applications run with 6 threads Co-located batch applications run with 2 threads Normalized to applications running solo with TTC mapping {X X } Results are consistent with 4-thread behavior

18 Varying Thread Count and Architecture Varying Architecture Experimented using Intel s Westmere (Xeon X5660) dual-socket platform Each socket has six cores with a 12MB shared LLC Shared memory controller with 3 channels of 8.5 GB/s bus to DIMM Key application running solo with 2 threads Normalized to the TTC mapping {X...X...}

19 Varying Thread Count and Architecture Varying Architecture Experimented using Intel s Westmere (Xeon X5660) dual-socket platform Each socket has six cores with a 12MB shared LLC Shared memory controller with 3 channels of 8.5 GB/s bus to DIMM Key application running solo with 6 threads Normalized to the TTC mapping {XXX...XXX...}

20 Varying Thread Count and Architecture Varying Architecture Experimented using Intel s Westmere (Xeon X5660) dual-socket platform Each socket has six cores with a 12MB shared LLC Shared memory controller with 3 channels of 8.5 GB/s bus to DIMM Key application running with 6 threads Batch application running with 6 threads Normalized to the TTC mapping {XXX...XXX...}

21 Thread-To-Core Mapping Memory Bandwidth Usage Hypothesis: Is the amount of bus bandwidth usage an indicator for its proper FSB sharing configuration?? Yes, FSB demand correlates with the optimal TTC mapping found High FSB demand applications significantly impact performance

22 Thread-To-Core Mapping Data Sharing The percentage of an application s cache lines in the share state can indicate an applications level of data sharing Measured LLC misses, shared LLC references, and other LLC hits

23 Thread-To-Core Mapping Cache Footprint Contention occurs when the total size of multiple threads cache footprints is larger than the cache itself LLC miss-rate is a good indicator of the footprint size Predicts potential performance degradation for co-runners

24 Thread-To-Core Mapping A Heuristic Approach to TTC Mapping An algorithm to predict optimal TTC mappings based on resource usage characteristics of an application (bus and cache usage, and data sharing) Avoids co-locating threads with the same resource bottlenecks while maximizing potential benefits from sharing resources

25 Thread-To-Core Mapping A Heuristic Approach to TTC Mapping Evaluating the Heuristics Limitations: Algorithm is hardware specific Characteristic profiles necessary for each application Characteristic metrics may not accurately represent the application Advantages: Effective and requires simple runtime support Evaluation:

26 Thread-To-Core Mapping An Adaptive Approach to TTC Mapping AToM Adaptive Thread-to-core Mapping Adaptive learning implementation that uses a competition heuristic Searches for optimal TTC assignment for a given set of threads Learning Phase Various mappings compete for the best performance Execution Phase Winner of learning phase is executed for some time period

27 Thread-To-Core Mapping An Adaptive Approach to TTC Mapping Evaluating AToM Applications configured to run on 4 cores Learning phase for 3 runs of 10 minutes each Execution phase for 1 run of 2 hours Performance normalized to performance under TTC mapping {X...X...}

28 Conclusion Outlined the importance of intelligent TTC Mapping decisions The optimal TTC mapping varies with the co-running application Presented key characteristics that impact TTC mappings Presented heuristics for TTC mapping using characteristics Presented a more attractive adaptive approach Simple to implement and very effective for long-running applications Discovered a performance swing of up to 40% A 1% improvement could save a datacenter millions $$ Using adaptive approach Datacenter workloads could improve 22% Perform within 3% of optimal mapping on average

29 Discussion Any questions???

30 Related Work Similar datacenter application studies All study effects of application characteristics On datacenter system design In combination with the underlying architecture On database workloads Architecture Studies investigate memory resource sharing and contention New architecture supports cache and memory bus management Software (Operating systems) Techniques focus on co-scheduling to avoid resource contention Control execution rate using hardware features or a runtime system Cache and contention aware schedulers and compilers

The Impact of Memory Subsystem Resource Sharing on Datacenter Applications

The Impact of Memory Subsystem Resource Sharing on Datacenter Applications The Impact of Memory Subsystem Resource Sharing on Datacenter Applications Neil Vachharajani Pure Storage neil@purestorage.com Lingjia Tang University of Virginia lt8f@cs.virginia.edu Robert Hundt Google

More information

ReSense: Mapping Dynamic Workloads of Colocated Multithreaded Applications Using Resource Sensitivity

ReSense: Mapping Dynamic Workloads of Colocated Multithreaded Applications Using Resource Sensitivity ReSense: Mapping Dynamic Workloads of Colocated Multithreaded Applications Using Resource Sensitivity TANIMA DEY, WEI WANG, JACK W. DAVIDSON, and MARY LOU SOFFA, University of Virginia To utilize the full

More information

Performance Analysis of Thread Mappings with a Holistic View of the Hardware Resources

Performance Analysis of Thread Mappings with a Holistic View of the Hardware Resources Performance Analysis of Thread Mappings with a Holistic View of the Hardware Resources Wei Wang, Tanima Dey, Jason Mars, Lingjia Tang, Jack Davidson, Mary Lou Soffa Department of Computer Science University

More information

Optimizing Shared Resource Contention in HPC Clusters

Optimizing Shared Resource Contention in HPC Clusters Optimizing Shared Resource Contention in HPC Clusters Sergey Blagodurov Simon Fraser University Alexandra Fedorova Simon Fraser University Abstract Contention for shared resources in HPC clusters occurs

More information

IT@Intel. Comparing Multi-Core Processors for Server Virtualization

IT@Intel. Comparing Multi-Core Processors for Server Virtualization White Paper Intel Information Technology Computer Manufacturing Server Virtualization Comparing Multi-Core Processors for Server Virtualization Intel IT tested servers based on select Intel multi-core

More information

Addressing Shared Resource Contention in Multicore Processors via Scheduling

Addressing Shared Resource Contention in Multicore Processors via Scheduling Addressing Shared Resource Contention in Multicore Processors via Scheduling Sergey Zhuravlev Sergey Blagodurov Alexandra Fedorova School of Computing Science, Simon Fraser University, Vancouver, Canada

More information

Energy-aware Memory Management through Database Buffer Control

Energy-aware Memory Management through Database Buffer Control Energy-aware Memory Management through Database Buffer Control Chang S. Bae, Tayeb Jamel Northwestern Univ. Intel Corporation Presented by Chang S. Bae Goal and motivation Energy-aware memory management

More information

Bubble-Up: Increasing Utilization in Modern Warehouse Scale Computers via Sensible Co-locations

Bubble-Up: Increasing Utilization in Modern Warehouse Scale Computers via Sensible Co-locations Bubble-Up: Increasing Utilization in Modern Warehouse Scale Computers via Sensible Co-locations Robert Hundt Google rhundt@google.com Jason Mars University of Virginia jom5x@cs.virginia.edu Kevin Skadron

More information

Measuring Interference Between Live Datacenter Applications

Measuring Interference Between Live Datacenter Applications Measuring Interference Between Live Datacenter Applications Melanie Kambadur Columbia University melanie@cs.columbia.edu Tipp Moseley Google, Inc. tipp@google.com Rick Hank Google, Inc. rhank@google.com

More information

Online Adaptation for Application Performance and Efficiency

Online Adaptation for Application Performance and Efficiency Online Adaptation for Application Performance and Efficiency A Dissertation Proposal by Jason Mars 20 November 2009 Submitted to the graduate faculty of the Department of Computer Science at the University

More information

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

More information

7 Real Benefits of a Virtual Infrastructure

7 Real Benefits of a Virtual Infrastructure 7 Real Benefits of a Virtual Infrastructure Dell September 2007 Even the best run IT shops face challenges. Many IT organizations find themselves with under-utilized servers and storage, yet they need

More information

On the Importance of Thread Placement on Multicore Architectures

On the Importance of Thread Placement on Multicore Architectures On the Importance of Thread Placement on Multicore Architectures HPCLatAm 2011 Keynote Cordoba, Argentina August 31, 2011 Tobias Klug Motivation: Many possibilities can lead to non-deterministic runtimes...

More information

Performance Analysis of Web based Applications on Single and Multi Core Servers

Performance Analysis of Web based Applications on Single and Multi Core Servers Performance Analysis of Web based Applications on Single and Multi Core Servers Gitika Khare, Diptikant Pathy, Alpana Rajan, Alok Jain, Anil Rawat Raja Ramanna Centre for Advanced Technology Department

More information

The Mainframe Virtualization Advantage: How to Save Over Million Dollars Using an IBM System z as a Linux Cloud Server

The Mainframe Virtualization Advantage: How to Save Over Million Dollars Using an IBM System z as a Linux Cloud Server Research Report The Mainframe Virtualization Advantage: How to Save Over Million Dollars Using an IBM System z as a Linux Cloud Server Executive Summary Information technology (IT) executives should be

More information

Measuring Cache and Memory Latency and CPU to Memory Bandwidth

Measuring Cache and Memory Latency and CPU to Memory Bandwidth White Paper Joshua Ruggiero Computer Systems Engineer Intel Corporation Measuring Cache and Memory Latency and CPU to Memory Bandwidth For use with Intel Architecture December 2008 1 321074 Executive Summary

More information

Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality

Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality Heechul Yun +, Gang Yao +, Rodolfo Pellizzoni *, Marco Caccamo +, Lui Sha + University of Illinois at Urbana and Champaign

More information

Impact of Java Application Server Evolution on Computer System Performance

Impact of Java Application Server Evolution on Computer System Performance Impact of Java Application Server Evolution on Computer System Performance Peng-fei Chuang, Celal Ozturk, Khun Ban, Huijun Yan, Kingsum Chow, Resit Sendag Intel Corporation; {peng-fei.chuang, khun.ban,

More information

Accelerating Business Intelligence with Large-Scale System Memory

Accelerating Business Intelligence with Large-Scale System Memory Accelerating Business Intelligence with Large-Scale System Memory A Proof of Concept by Intel, Samsung, and SAP Executive Summary Real-time business intelligence (BI) plays a vital role in driving competitiveness

More information

TRACE PERFORMANCE TESTING APPROACH. Overview. Approach. Flow. Attributes

TRACE PERFORMANCE TESTING APPROACH. Overview. Approach. Flow. Attributes TRACE PERFORMANCE TESTING APPROACH Overview Approach Flow Attributes INTRODUCTION Software Testing Testing is not just finding out the defects. Testing is not just seeing the requirements are satisfied.

More information

Oracle Database Reliability, Performance and scalability on Intel Xeon platforms Mitch Shults, Intel Corporation October 2011

Oracle Database Reliability, Performance and scalability on Intel Xeon platforms Mitch Shults, Intel Corporation October 2011 Oracle Database Reliability, Performance and scalability on Intel platforms Mitch Shults, Intel Corporation October 2011 1 Intel Processor E7-8800/4800/2800 Product Families Up to 10 s and 20 Threads 30MB

More information

Delivering Quality in Software Performance and Scalability Testing

Delivering Quality in Software Performance and Scalability Testing Delivering Quality in Software Performance and Scalability Testing Abstract Khun Ban, Robert Scott, Kingsum Chow, and Huijun Yan Software and Services Group, Intel Corporation {khun.ban, robert.l.scott,

More information

Audit & Tune Deliverables

Audit & Tune Deliverables Audit & Tune Deliverables The Initial Audit is a way for CMD to become familiar with a Client's environment. It provides a thorough overview of the environment and documents best practices for the PostgreSQL

More information

Bubble-Flux: Precise Online QoS Management for Increased Utilization in Warehouse Scale Computers

Bubble-Flux: Precise Online QoS Management for Increased Utilization in Warehouse Scale Computers Bubble-Flux: Precise Online QoS Management for Increased Utilization in Warehouse Scale Computers Hailong Yang Alex Breslow Jason Mars Lingjia Tang University of California, San Diego {h5yang, abreslow,

More information

Performance Characteristics of Large SMP Machines

Performance Characteristics of Large SMP Machines Performance Characteristics of Large SMP Machines Dirk Schmidl, Dieter an Mey, Matthias S. Müller schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) Agenda Investigated Hardware Kernel Benchmark

More information

MCC-DB: Minimizing Cache Conflicts in Multicore Processors for Databases

MCC-DB: Minimizing Cache Conflicts in Multicore Processors for Databases MCC-DB: Minimizing Cache Conflicts in Multicore Processors for Databases Rubao Lee 1,2 Xiaoning Ding 2 Feng Chen 2 Qingda Lu 3 Xiaodong Zhang 2 1 Inst. of Computing Tech., Chinese Academy of Sciences 2

More information

FACT: a Framework for Adaptive Contention-aware Thread migrations

FACT: a Framework for Adaptive Contention-aware Thread migrations FACT: a Framework for Adaptive Contention-aware Thread migrations Kishore Kumar Pusukuri Department of Computer Science and Engineering University of California, Riverside, CA 92507. kishore@cs.ucr.edu

More information

Multi-core and Linux* Kernel

Multi-core and Linux* Kernel Multi-core and Linux* Kernel Suresh Siddha Intel Open Source Technology Center Abstract Semiconductor technological advances in the recent years have led to the inclusion of multiple CPU execution cores

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads

HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads Gen9 Servers give more performance per dollar for your investment. Executive Summary Information Technology (IT) organizations face increasing

More information

Improved Virtualization Performance with 9th Generation Servers

Improved Virtualization Performance with 9th Generation Servers Improved Virtualization Performance with 9th Generation Servers David J. Morse Dell, Inc. August 2006 Contents Introduction... 4 VMware ESX Server 3.0... 4 SPECjbb2005... 4 BEA JRockit... 4 Hardware/Software

More information

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Oracle Database Scalability in VMware ESX VMware ESX 3.5 Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises

More information

Rackspace Cloud Databases and Container-based Virtualization

Rackspace Cloud Databases and Container-based Virtualization Rackspace Cloud Databases and Container-based Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many

More information

FPGA-based Multithreading for In-Memory Hash Joins

FPGA-based Multithreading for In-Memory Hash Joins FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded

More information

Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers

Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers WHITE PAPER FUJITSU PRIMERGY AND PRIMEPOWER SERVERS Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers CHALLENGE Replace a Fujitsu PRIMEPOWER 2500 partition with a lower cost solution that

More information

Performance Tuning and Optimizing SQL Databases 2016

Performance Tuning and Optimizing SQL Databases 2016 Performance Tuning and Optimizing SQL Databases 2016 http://www.homnick.com marketing@homnick.com +1.561.988.0567 Boca Raton, Fl USA About this course This four-day instructor-led course provides students

More information

Exploring Multi-Threaded Java Application Performance on Multicore Hardware

Exploring Multi-Threaded Java Application Performance on Multicore Hardware Exploring Multi-Threaded Java Application Performance on Multicore Hardware Jennifer B. Sartor and Lieven Eeckhout Ghent University, Belgium Abstract While there have been many studies of how to schedule

More information

Violin Memory 7300 Flash Storage Platform Supports Multiple Primary Storage Workloads

Violin Memory 7300 Flash Storage Platform Supports Multiple Primary Storage Workloads Violin Memory 7300 Flash Storage Platform Supports Multiple Primary Storage Workloads Web server, SQL Server OLTP, Exchange Jetstress, and SharePoint Workloads Can Run Simultaneously on One Violin Memory

More information

Innovativste XEON Prozessortechnik für Cisco UCS

Innovativste XEON Prozessortechnik für Cisco UCS Innovativste XEON Prozessortechnik für Cisco UCS Stefanie Döhler Wien, 17. November 2010 1 Tick-Tock Development Model Sustained Microprocessor Leadership Tick Tock Tick 65nm Tock Tick 45nm Tock Tick 32nm

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

VMware vsphere 6 and Oracle Database Scalability Study

VMware vsphere 6 and Oracle Database Scalability Study VMware vsphere 6 and Oracle Database Scalability Study Scaling Monster Virtual Machines TECHNICAL WHITE PAPER Table of Contents Executive Summary... 3 Introduction... 3 Test Environment... 3 Virtual Machine

More information

Oracle Developer Studio Performance Analyzer

Oracle Developer Studio Performance Analyzer Oracle Developer Studio Performance Analyzer The Oracle Developer Studio Performance Analyzer provides unparalleled insight into the behavior of your application, allowing you to identify bottlenecks and

More information

Tableau Server 7.0 scalability

Tableau Server 7.0 scalability Tableau Server 7.0 scalability February 2012 p2 Executive summary In January 2012, we performed scalability tests on Tableau Server to help our customers plan for large deployments. We tested three different

More information

A Quantum Leap in Enterprise Computing

A Quantum Leap in Enterprise Computing A Quantum Leap in Enterprise Computing Unprecedented Reliability and Scalability in a Multi-Processor Server Product Brief Intel Xeon Processor 7500 Series Whether you ve got data-demanding applications,

More information

A Topology-Aware Performance Monitoring Tool for Shared Resource Management in Multicore Systems

A Topology-Aware Performance Monitoring Tool for Shared Resource Management in Multicore Systems A Topology-Aware Performance Monitoring Tool for Shared Resource Management in Multicore Systems TADaaM Team - Nicolas Denoyelle - Brice Goglin - Emmanuel Jeannot August 24, 2015 1. Context/Motivations

More information

Scalability Results. Select the right hardware configuration for your organization to optimize performance

Scalability Results. Select the right hardware configuration for your organization to optimize performance Scalability Results Select the right hardware configuration for your organization to optimize performance Table of Contents Introduction... 1 Scalability... 2 Definition... 2 CPU and Memory Usage... 2

More information

Vendor and Hardware Platform: Fujitsu BX924 S2 Virtualization Platform: VMware ESX 4.0 Update 2 (build 261974)

Vendor and Hardware Platform: Fujitsu BX924 S2 Virtualization Platform: VMware ESX 4.0 Update 2 (build 261974) Vendor and Hardware Platform: Fujitsu BX924 S2 Virtualization Platform: VMware ESX 4.0 Update 2 (build 261974) Performance Section Performance Tested By: Fujitsu Test Date: 10-05-2010 Configuration Section

More information

SIDN Server Measurements

SIDN Server Measurements SIDN Server Measurements Yuri Schaeffer 1, NLnet Labs NLnet Labs document 2010-003 July 19, 2010 1 Introduction For future capacity planning SIDN would like to have an insight on the required resources

More information

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION A DIABLO WHITE PAPER AUGUST 2014 Ricky Trigalo Director of Business Development Virtualization, Diablo Technologies

More information

RED HAT ENTERPRISE VIRTUALIZATION PERFORMANCE: SPECVIRT BENCHMARK

RED HAT ENTERPRISE VIRTUALIZATION PERFORMANCE: SPECVIRT BENCHMARK RED HAT ENTERPRISE VIRTUALIZATION PERFORMANCE: SPECVIRT BENCHMARK AT A GLANCE The performance of Red Hat Enterprise Virtualization can be compared to other virtualization platforms using the SPECvirt_sc2010

More information

Sun 8Gb/s Fibre Channel HBA Performance Advantages for Oracle Database

Sun 8Gb/s Fibre Channel HBA Performance Advantages for Oracle Database Performance Advantages for Oracle Database At a Glance This Technical Brief illustrates that even for smaller online transaction processing (OLTP) databases, the Sun 8Gb/s Fibre Channel Host Bus Adapter

More information

Achieving QoS in Server Virtualization

Achieving QoS in Server Virtualization Achieving QoS in Server Virtualization Intel Platform Shared Resource Monitoring/Control in Xen Chao Peng (chao.p.peng@intel.com) 1 Increasing QoS demand in Server Virtualization Data center & Cloud infrastructure

More information

Intelligent Heuristic Construction with Active Learning

Intelligent Heuristic Construction with Active Learning Intelligent Heuristic Construction with Active Learning William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, Hugh Leather E H U N I V E R S I T Y T O H F G R E D I N B U Space is BIG! Hubble Ultra-Deep Field

More information

Summary. Key results at a glance:

Summary. Key results at a glance: An evaluation of blade server power efficiency for the, Dell PowerEdge M600, and IBM BladeCenter HS21 using the SPECjbb2005 Benchmark The HP Difference The ProLiant BL260c G5 is a new class of server blade

More information

System Requirements Table of contents

System Requirements Table of contents Table of contents 1 Introduction... 2 2 Knoa Agent... 2 2.1 System Requirements...2 2.2 Environment Requirements...4 3 Knoa Server Architecture...4 3.1 Knoa Server Components... 4 3.2 Server Hardware Setup...5

More information

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC

More information

Quad-Core Intel Xeon Processor

Quad-Core Intel Xeon Processor Product Brief Intel Xeon Processor 7300 Series Quad-Core Intel Xeon Processor 7300 Series Maximize Performance and Scalability in Multi-Processor Platforms Built for Virtualization and Data Demanding Applications

More information

Big Data Technologies for Ultra-High-Speed Data Transfer and Processing

Big Data Technologies for Ultra-High-Speed Data Transfer and Processing White Paper Intel Xeon Processor E5 Family Big Data Analytics Cloud Computing Solutions Big Data Technologies for Ultra-High-Speed Data Transfer and Processing Using Technologies from Aspera and Intel

More information

An OS-oriented performance monitoring tool for multicore systems

An OS-oriented performance monitoring tool for multicore systems An OS-oriented performance monitoring tool for multicore systems J.C. Sáez, J. Casas, A. Serrano, R. Rodríguez-Rodríguez, F. Castro, D. Chaver, M. Prieto-Matias Department of Computer Architecture Complutense

More information

Analyzing the Virtualization Deployment Advantages of Two- and Four-Socket Server Platforms

Analyzing the Virtualization Deployment Advantages of Two- and Four-Socket Server Platforms IT@Intel White Paper Intel IT IT Best Practices: Data Center Solutions Server Virtualization August 2010 Analyzing the Virtualization Deployment Advantages of Two- and Four-Socket Server Platforms Executive

More information

Full and Para Virtualization

Full and Para Virtualization Full and Para Virtualization Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF x86 Hardware Virtualization The x86 architecture offers four levels

More information

Leading Virtualization Performance and Energy Efficiency in a Multi-processor Server

Leading Virtualization Performance and Energy Efficiency in a Multi-processor Server Leading Virtualization Performance and Energy Efficiency in a Multi-processor Server Product Brief Intel Xeon processor 7400 series Fewer servers. More performance. With the architecture that s specifically

More information

Streaming and Virtual Hosted Desktop Study: Phase 2

Streaming and Virtual Hosted Desktop Study: Phase 2 IT@Intel White Paper Intel Information Technology Computing Models April 1 Streaming and Virtual Hosted Desktop Study: Phase 2 Our current findings indicate that streaming provides better server loading

More information

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Technical white paper HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Scale-up your Microsoft SQL Server environment to new heights Table of contents Executive summary... 2 Introduction...

More information

Describe the SQL Server components and SQL OS Describe the differences between Windows Scheduling and SQL scheduling Describe waits and queues

Describe the SQL Server components and SQL OS Describe the differences between Windows Scheduling and SQL scheduling Describe waits and queues Course Page - Page 1 of 5 Performance Tuning and Optimizing SQL Databases M-10987 Length: 4 days Price: $ 2,495.00 Course Description This four-day instructor-led course provides students who manage and

More information

Host Power Management in VMware vsphere 5.5

Host Power Management in VMware vsphere 5.5 in VMware vsphere 5.5 Performance Study TECHNICAL WHITE PAPER Table of Contents Introduction...3 Power Management BIOS Settings...3 Host Power Management in ESXi 5.5... 5 Relationship between HPM and DPM...

More information

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies Kurt Klemperer, Principal System Performance Engineer kklemperer@blackboard.com Agenda Session Length:

More information

Proven Performance for Accenture Duck Creek Policy Administration Commercial Lines

Proven Performance for Accenture Duck Creek Policy Administration Commercial Lines Proven Performance for Accenture Duck Creek Policy Administration Commercial Lines Benchmark testing confirms scalability and performance of Microsoft SQL Server 2012 and servers based on the Intel Xeon

More information

Proven Performance for Accenture Duck Creek Policy Administration Commercial Lines

Proven Performance for Accenture Duck Creek Policy Administration Commercial Lines Proven Performance for Accenture Duck Creek Policy Administration Commercial Lines Benchmark testing confirms scalability and performance of Microsoft SQL Server 2012 and servers based on the Intel Xeon

More information

8Gb Fibre Channel Adapter of Choice in Microsoft Hyper-V Environments

8Gb Fibre Channel Adapter of Choice in Microsoft Hyper-V Environments 8Gb Fibre Channel Adapter of Choice in Microsoft Hyper-V Environments QLogic 8Gb Adapter Outperforms Emulex QLogic Offers Best Performance and Scalability in Hyper-V Environments Key Findings The QLogic

More information

MAGENTO HOSTING Progressive Server Performance Improvements

MAGENTO HOSTING Progressive Server Performance Improvements MAGENTO HOSTING Progressive Server Performance Improvements Simple Helix, LLC 4092 Memorial Parkway Ste 202 Huntsville, AL 35802 sales@simplehelix.com 1.866.963.0424 www.simplehelix.com 2 Table of Contents

More information

Dynamic Virtual Machine Scheduling in Clouds for Architectural Shared Resources

Dynamic Virtual Machine Scheduling in Clouds for Architectural Shared Resources Dynamic Virtual Machine Scheduling in Clouds for Architectural Shared Resources JeongseobAhn,Changdae Kim, JaeungHan,Young-ri Choi,and JaehyukHuh KAIST UNIST {jeongseob, cdkim, juhan, and jhuh}@calab.kaist.ac.kr

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis

More information

Accelerating Business Intelligence with Large-Scale System Memory

Accelerating Business Intelligence with Large-Scale System Memory Accelerating Business Intelligence with Large-Scale System Memory A Proof of Concept by Intel, Samsung, and SAP Executive Summary Real-time business intelligence (BI) plays a vital role in driving competitiveness

More information

Hardware performance monitoring. Zoltán Majó

Hardware performance monitoring. Zoltán Majó Hardware performance monitoring Zoltán Majó 1 Question Did you take any of these lectures: Computer Architecture and System Programming How to Write Fast Numerical Code Design of Parallel and High Performance

More information

MAQAO Performance Analysis and Optimization Tool

MAQAO Performance Analysis and Optimization Tool MAQAO Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL andres.charif@uvsq.fr Performance Evaluation Team, University of Versailles S-Q-Y http://www.maqao.org VI-HPS 18 th Grenoble 18/22

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

DYNAMIC CACHE-USAGE PROFILER FOR THE XEN HYPERVISOR WIRA DAMIS MULIA. Bachelor of Science in Electrical and Computer. Engineering

DYNAMIC CACHE-USAGE PROFILER FOR THE XEN HYPERVISOR WIRA DAMIS MULIA. Bachelor of Science in Electrical and Computer. Engineering DYNAMIC CACHE-USAGE PROFILER FOR THE XEN HYPERVISOR By WIRA DAMIS MULIA Bachelor of Science in Electrical and Computer Engineering Oklahoma State University Stillwater, Oklahoma 2009 Submitted to the Faculty

More information

Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation

Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation Evaluation report prepared under contract with HP Executive Summary The computing industry is experiencing an increasing demand for

More information

High-Density Network Flow Monitoring

High-Density Network Flow Monitoring Petr Velan petr.velan@cesnet.cz High-Density Network Flow Monitoring IM2015 12 May 2015, Ottawa Motivation What is high-density flow monitoring? Monitor high traffic in as little rack units as possible

More information

ACCELERATE SQL SERVER 2014 WITH BUFFER POOL EXTENSION ON LSI NYTRO WARPDRIVE

ACCELERATE SQL SERVER 2014 WITH BUFFER POOL EXTENSION ON LSI NYTRO WARPDRIVE THE CHALLENGE Meeting database performance demands. Increasing data that needs to be read quickly can be bottlenecked by the reluctance to add memory due to cost. THE SOLUTION The SQL Server 214 BPE feature

More information

Memory Performance at Reduced CPU Clock Speeds: An Analysis of Current x86 64 Processors

Memory Performance at Reduced CPU Clock Speeds: An Analysis of Current x86 64 Processors Memory Performance at Reduced CPU Clock Speeds: An Analysis of Current x86 64 Processors Robert Schöne, Daniel Hackenberg, and Daniel Molka Center for Information Services and High Performance Computing

More information

Enabling Technologies for Distributed and Cloud Computing

Enabling Technologies for Distributed and Cloud Computing Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Multi-core CPUs and Multithreading

More information

PVSCSI Storage Performance

PVSCSI Storage Performance Performance Study PVSCSI Storage Performance VMware ESX 4.0 VMware vsphere 4 offers Paravirtualized SCSI (PVSCSI), a new virtual storage adapter. This document provides a performance comparison of PVSCSI

More information

CS 147: Computer Systems Performance Analysis

CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis 1 / 39 Overview Overview Overview What is a Workload? Instruction Workloads Synthetic Workloads Exercisers and

More information

Detailed Characterization of HPC Applications for Co-Scheduling

Detailed Characterization of HPC Applications for Co-Scheduling Detailed Characterization of HPC Applications for Co-Scheduling ABSTRACT Josef Weidendorfer Department of Informatics, Chair for Computer Architecture Technische Universität München weidendo@in.tum.de

More information

System Requirements Document

System Requirements Document System Requirements Document Table Of Contents Overview... 2 ADVANTAGE 2009... 3 Server Hardware... 3 Proprietary Navision Database... 4 Microsoft SQL Server 2005 /2008 Database... 5 SQL Server Hardware...

More information

Massimo Bernaschi Istituto Applicazioni del Calcolo Consiglio Nazionale delle Ricerche. massimo.bernaschi@cnr.it

Massimo Bernaschi Istituto Applicazioni del Calcolo Consiglio Nazionale delle Ricerche. massimo.bernaschi@cnr.it Massimo Bernaschi Istituto Applicazioni del Calcolo Consiglio Nazionale delle Ricerche massimo.bernaschi@cnr.it Performance There are two main measurements of performance. Execution time is what we ll

More information

Enabling Technologies for Distributed Computing

Enabling Technologies for Distributed Computing Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies

More information

Cisco Unified Computing System and EMC VNX5300 Unified Storage Platform

Cisco Unified Computing System and EMC VNX5300 Unified Storage Platform Cisco Unified Computing System and EMC VNX5300 Unified Storage Platform Implementing an Oracle Data Warehouse Test Workload White Paper January 2011, Revision 1.0 Contents Executive Summary... 3 Cisco

More information

CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms

CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms CloudSim: A Toolkit for Modeling and Simulation of Cloud Computing Environments and Evaluation of Resource Provisioning Algorithms Rodrigo N. Calheiros, Rajiv Ranjan, Anton Beloglazov, César A. F. De Rose,

More information

Virtualization Performance on SGI UV 2000 using Red Hat Enterprise Linux 6.3 KVM

Virtualization Performance on SGI UV 2000 using Red Hat Enterprise Linux 6.3 KVM White Paper Virtualization Performance on SGI UV 2000 using Red Hat Enterprise Linux 6.3 KVM September, 2013 Author Sanhita Sarkar, Director of Engineering, SGI Abstract This paper describes how to implement

More information

Virtualization Performance Analysis November 2010 Effect of SR-IOV Support in Red Hat KVM on Network Performance in Virtualized Environments

Virtualization Performance Analysis November 2010 Effect of SR-IOV Support in Red Hat KVM on Network Performance in Virtualized Environments Virtualization Performance Analysis November 2010 Effect of SR-IOV Support in Red Hat KVM on Network Performance in Steve Worley System x Performance Analysis and Benchmarking IBM Systems and Technology

More information

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5 Performance Study VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5 VMware VirtualCenter uses a database to store metadata on the state of a VMware Infrastructure environment.

More information

Carlos Villavieja, Nacho Navarro {cvillavi,nacho}@ac.upc.edu. Arati Baliga, Liviu Iftode {aratib,liviu}@cs.rutgers.edu

Carlos Villavieja, Nacho Navarro {cvillavi,nacho}@ac.upc.edu. Arati Baliga, Liviu Iftode {aratib,liviu}@cs.rutgers.edu Continuous Monitoring using MultiCores Carlos Villavieja, Nacho Navarro {cvillavi,nacho}@ac.upc.edu Arati Baliga, Liviu Iftode {aratib,liviu}@cs.rutgers.edu Motivation Intrusion detection Intruder gets

More information

Parallels VDI Solution

Parallels VDI Solution Parallels VDI Solution White Paper VDI SIZING A Competitive Comparison of VDI Solution Sizing between Parallels VDI versus VMware VDI www.parallels.com Parallels VDI Sizing. 29 Table of Contents Overview...

More information

Optimizing Linux for Dual-Core AMD Opteron Processors

Optimizing Linux for Dual-Core AMD Opteron Processors Technical White Paper DATA CENTER Optimizing Linux for Dual-Core * AMD Opteron Processors Optimizing Linux for Dual-Core AMD Opteron Processors Table of Contents: 2.... SUSE Linux Enterprise and the AMD

More information

Best Practices. Server: Power Benchmark

Best Practices. Server: Power Benchmark Best Practices Server: Power Benchmark Rising global energy costs and an increased energy consumption of 2.5 percent in 2011 is driving a real need for combating server sprawl via increased capacity and

More information

Sequential Performance Analysis with Callgrind and KCachegrind

Sequential Performance Analysis with Callgrind and KCachegrind Sequential Performance Analysis with Callgrind and KCachegrind 4 th Parallel Tools Workshop, HLRS, Stuttgart, September 7/8, 2010 Josef Weidendorfer Lehrstuhl für Rechnertechnik und Rechnerorganisation

More information

Infrastructure Matters: POWER8 vs. Xeon x86

Infrastructure Matters: POWER8 vs. Xeon x86 Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report

More information