Performance Analysis of Thread Mappings with a Holistic View of the Hardware Resources
|
|
- Delilah Martin
- 8 years ago
- Views:
Transcription
1 Performance Analysis of Thread Mappings with a Holistic View of the Hardware Resources Wei Wang, Tanima Dey, Jason Mars, Lingjia Tang, Jack Davidson, Mary Lou Soffa Department of Computer Science University of Virginia ISPASS 2012 This research is supported in part by NSF grant number CCF
2 Motivation Chip-multiprocessors offer large number of cores and ample resources Number of simultaneously executing applications is increasing Careful resource management is critical Thread mapping is a powerful technique for resource management ISPASS 2012 Wang et al., University of Virginia 2
3 Challenges for Thread Mapping Multiple resources are effected Threads demonstrate various run-time characteristics Multi-threaded workloads are emerging ISPASS 2012 Wang et al., University of Virginia 3
4 Goal of this Research Analyze why a particular thread mapping is better than another mapping: What are the resources that cause the performance differences? What are the thread characteristics that cause the resource utilization differences? What is the relative importance of various resources? ISPASS 2012 Wang et al., University of Virginia 4
5 Contributions In-depth performance analyses of various thread mappings using multi-threaded applications on real hardware Identify the key hardware resources Determine the impact on key resource utilization Introduce a new metric L2MP to analyze the performance of the combined memory resources Provide a ranking of the resources ISPASS 2012 Wang et al., University of Virginia 5
6 Outline Motivation Challenges Contributions Overview resource, metric, mappings Analysis prefetchers, processor cores Key findings for thread mapping Conclusion ISPASS 2012 Wang et al., University of Virginia 6
7 Overview A comprehensive analyses considering various factors Application s performance Application s characteristics Hardware resources shared by applications Utilization of the resources ISPASS 2012 Wang et al., University of Virginia 7
8 Resources and Metrics Resources Memory Resources: L1 I/D, I/D TLB, L2, Prefetchers, Memory interconnect Processor Resources: Memory disambiguation units, branch predictors, Processor Core Metrics Cache misses, mis-predictions, memory latency (with hardware performance counters (HPCs)) Processor utilization (from OS) Execution cycles and execution time ISPASS 2012 Wang et al., University of Virginia 8
9 Thread Characteristics of Multithreaded Applications Single thread characteristics Cache demand Memory bandwidth demand I/O frequency Prefetcher effectiveness Prefetcher excessiveness Multiple thread characteristics Sibling Threads Data and instruction sharing Frequency of synchronization ISPASS 2012 Wang et al., University of Virginia 9
10 Four Thread Mappings Mapping Core 0 Core 1 Core 2 Core 3 LLC0 LLC1 OSMap Any thread Any thread Any thread Any thread IsoMap a1, a1 a1,a1 a2, a2 a2, a2 IntMap a1, a1 a2,a2 a1,a1 a2,a2 SprMap a1, a2 a1,a2 a1,a2 a1,a2 App 1 Core 0 Core 1 Core 2 Core 3 App 2 L1 Cache TLB L2 Cache L1 Cache TLB L1 Cache TLB L2 Cache L1 Cache TLB Hardware Prefetchers Hardware Prefetchers Off-Chip Mem Interconnect ISPASS
11 Experimental Setup Platform & Workloads Intel Core 2 Q9550 Processor PARSEC benchmark suite benchmarks All possible pairs (36) using the 9 benchmarks 4 worker threads each benchmark Core 0 Core 1 L1 TLB L1 TLB L2 Cache Hardware Prefetchers Core 2 Core 3 L1 TLB Memory Controller & Memory L1 TLB L2 Cache Hardware Prefetchers ISPASS 2012 Wang et al., University of Virginia 11
12 Key Resources A key resource is identified Utilization of the resource varies considerably Utilization variation results in difference in application s performance Identification technique Direct approach: use HPCs Indirect approach: use application s performance in different mappings ISPASS 2012 Wang et al., University of Virginia 12
13 Key Resources More important resources Memory resources Processor resources L1D-cache Branch predictor L2-cache Processor core Hardware prefetchers Memory interconnect Less important resources L1I-cache I/D TLB Memory disambiguation unit ISPASS 2012 Wang et al., University of Virginia 13
14 Analysis Hardware Prefetchers Experimental Results: streamcluster (w. blackscholes) ISPASS
15 Key Findings for Hardware Prefetchers Case 1: Threads that share high amount of data Sharing the same cache improves performance ISPASS 2012 Wang et al., University of Virginia 15
16 Key Findings for Hardware Prefetchers Case 2: Threads that have low or no data sharing but high prefetcher excessiveness Sharing the same prefetchers improves performance ISPASS 2012 Wang et al., University of Virginia 16
17 Key Findings for Hardware Prefetchers Case 3: Threads that have low data sharing and low prefetcher excessiveness Fewer cache misses and prefetch operations improves performance ISPASS 2012 Wang et al., University of Virginia 17
18 Analysis Processor Cores Processor utilization ISPASS 2012 Wang et al., University of Virginia 18
19 Analysis Processor Cores Performance impact ISPASS 2012 Wang et al., University of Virginia 19
20 Key Findings for Processor Cores Case 1: Sibling threads have frequent synchronization ISPASS 2012 Wang et al., University of Virginia 20
21 Key Findings for Processor Cores Case 2: Sibling threads have frequent I/O operations ISPASS 2012 Wang et al., University of Virginia 21
22 Managing Multiple Resources Example L2 caches, prefetchers, and memory bandwidth are closely related resources A single metric to evaluate their aggregated performance impact L2MP: L2-cache-misses-memory-latencyproduct L2MP = L2_cache_misses X Memory_latency ISPASS 2012 Wang et al., University of Virginia 22
23 L2MP L2MP is good indicator of performance ISPASS 2012 Wang et al., University of Virginia 23
24 Managing Multiple Resources Thread mapping algorithms Consider all the key resources together Improve the utilizations of the resources that provide the maximum benefit Consider co-running application s characteristics ISPASS 2012 Wang et al., University of Virginia 24
25 Findings for Multiple Resources For memory-intensive applications streamcluster, canneal, facesim, fluidanimate Maximize the L2MP metric For I/O- or CPU-intensive applications swaptions, blackscholes, vips, x264, bodytrack Maximize processor utilization ISPASS 2012 Wang et al., University of Virginia 25
26 Conclusion Identified six key resources Analyzed how to map threads with particular characteristics to improve resource utilization Introduced a new metric L2MP for managing key memory resources Determined relative importance of the key resources ISPASS 2012 Wang et al., University of Virginia 26
27 Related Work Shared-cache-aware thread mapping Jiang et al. PACT 2008 Chandra et al. HPCA 2005 Xie et al. CMP-MSI 2008 Knauerhase et al. IEEE-Micro 2008 Cache-Prefetcher-FSB-aware thread mapping Zhuravlev et al. ASPLOS 2010 ISPASS 2012 Wang et al., University of Virginia 27
28 Thank you & Questions? ISPASS 2012 Wang et al., University of Virginia 28
ReSense: Mapping Dynamic Workloads of Colocated Multithreaded Applications Using Resource Sensitivity
ReSense: Mapping Dynamic Workloads of Colocated Multithreaded Applications Using Resource Sensitivity TANIMA DEY, WEI WANG, JACK W. DAVIDSON, and MARY LOU SOFFA, University of Virginia To utilize the full
More informationThe Impact of Memory Subsystem Resource Sharing on Datacenter Applications. Lingia Tang Jason Mars Neil Vachharajani Robert Hundt Mary Lou Soffa
The Impact of Memory Subsystem Resource Sharing on Datacenter Applications Lingia Tang Jason Mars Neil Vachharajani Robert Hundt Mary Lou Soffa Introduction Problem Recent studies into the effects of memory
More informationThread Reinforcer: Dynamically Determining Number of Threads via OS Level Monitoring
Thread Reinforcer: Dynamically Determining via OS Level Monitoring Kishore Kumar Pusukuri, Rajiv Gupta, Laxmi N. Bhuyan Department of Computer Science and Engineering University of California, Riverside
More informationOptimizing Shared Resource Contention in HPC Clusters
Optimizing Shared Resource Contention in HPC Clusters Sergey Blagodurov Simon Fraser University Alexandra Fedorova Simon Fraser University Abstract Contention for shared resources in HPC clusters occurs
More informationThe Impact of Memory Subsystem Resource Sharing on Datacenter Applications
The Impact of Memory Subsystem Resource Sharing on Datacenter Applications Neil Vachharajani Pure Storage neil@purestorage.com Lingjia Tang University of Virginia lt8f@cs.virginia.edu Robert Hundt Google
More informationAutonomous Resource Sharing for Multi-Threaded Workloads in Virtualized Servers
Autonomous Resource Sharing for Multi-Threaded Workloads in Virtualized Servers Can Hankendi* hankendi@bu.edu Ayse K. Coskun* acoskun@bu.edu Electrical and Computer Engineering Department Boston University
More informationThe Advantages of an Autopilot Resource Allocation Strategy - A Case Study
Energy-Efficient Server Consolidation for Multi-threaded Applications in the Cloud Can Hankendi ECE Department Boston University, Boston, MA Email: hankendi@bu.edu Ayse K. Coskun ECE Department Boston
More informationAn Approach to Resource-Aware Co-Scheduling for CMPs
An Approach to Resource-Aware Co-Scheduling for CMPs Major Bhadauria Computer Systems Laboratory Cornell University Ithaca, NY, USA major@csl.cornell.edu Sally A. McKee Dept. of Computer Science and Engineering
More informationModeling the Effects on Power and Performance from Memory Interference of Co-located Applications in Multicore Systems
Modeling the Effects on Power and Performance from Memory Interference of Co-located Applications in Multicore Systems Daniel Dauwe 1, Ryan Friese 1, Sudeep Pasricha 1,2, Anthony A. Maciejewski 1, Gregory
More informationPARSEC vs. SPLASH 2: A Quantitative Comparison of Two Multithreaded Benchmark Suites on Chip Multiprocessors
PARSEC vs. SPLASH 2: A Quantitative Comparison of Two Multithreaded Benchmark Suites on Chip Multiprocessors ChristianBienia,SanjeevKumar andkaili DepartmentofComputerScience,PrincetonUniversity MicroprocessorTechnologyLabs,Intel
More informationAddressing Shared Resource Contention in Multicore Processors via Scheduling
Addressing Shared Resource Contention in Multicore Processors via Scheduling Sergey Zhuravlev Sergey Blagodurov Alexandra Fedorova School of Computing Science, Simon Fraser University, Vancouver, Canada
More informationThe Data Center as a Grid Load Stabilizer
The Data Center as a Grid Load Stabilizer Hao Chen *, Michael C. Caramanis ** and Ayse K. Coskun * * Department of Electrical and Computer Engineering ** Division of Systems Engineering Boston University
More informationModeling Communication Costs in Blade Servers
odeling Communication Costs in Blade Servers Qiuyun Wang, Benjamin C. Lee Duke University Department of Electrical and Computer Engineering {qiuyun.wang, benjamin.c.lee}@duke.edu ABSTRACT Datacenters demand
More informationOperating System Impact on SMT Architecture
Operating System Impact on SMT Architecture The work published in An Analysis of Operating System Behavior on a Simultaneous Multithreaded Architecture, Josh Redstone et al., in Proceedings of the 9th
More informationApplication Heartbeats for Software Performance and Health Henry Hoffmann, Jonathan Eastep, Marco Santambrogio, Jason Miller, and Anant Agarwal
Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-29-35 August 7, 29 Application Heartbeats for Software Performance and Health Henry Hoffmann, Jonathan Eastep, Marco
More informationCSHARP: Coherence and SHaring Aware Replacement Policies for Parallel Applications
CSHARP: Coherence and SHaring Aware Replacement Policies for Parallel Applications Biswabandan Panda Department of CSE, IIT Madras, India Email: biswa@cse.iitm.ac.in Shankar Balachandran Department of
More informationOperating System Scheduling for Efficient Online Self-Test in Robust Systems. Yanjing Li. Onur Mutlu. Subhasish Mitra
Operating System Scheduling for Efficient Online Self-Test in Robust Systems Yanjing Li Stanford University Onur Mutlu Carnegie Mellon University Subhasish Mitra Stanford University 1 Why Online Self-Test
More informationArchitecture Support for Big Data Analytics
Architecture Support for Big Data Analytics Ahsan Javed Awan EMJD-DC (KTH-UPC) (http://uk.linkedin.com/in/ahsanjavedawan/) Supervisors: Mats Brorsson(KTH), Eduard Ayguade(UPC), Vladimir Vlassov(KTH) 1
More informationA Tumbler: An Effective Load Balancing Technique for MultiCPU Multicore Systems
A Tumbler: An Effective Load Balancing Technique for MultiCPU Multicore Systems KISHORE KUMAR PUSUKURI, University of California, Riverside RAJIV GUPTA, University of California, Riverside LAXMI N. BHUYAN,
More informationPerformance monitoring with Intel Architecture
Performance monitoring with Intel Architecture CSCE 351: Operating System Kernels Lecture 5.2 Why performance monitoring? Fine-tune software Book-keeping Locating bottlenecks Explore potential problems
More informationFPGA-based Multithreading for In-Memory Hash Joins
FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded
More informationMemory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality
Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality Heechul Yun +, Gang Yao +, Rodolfo Pellizzoni *, Marco Caccamo +, Lui Sha + University of Illinois at Urbana and Champaign
More informationInterval Simulation: Raising the Level of Abstraction in Architectural Simulation
Interval Simulation: Raising the Level of Abstraction in Architectural Simulation Davy Genbrugge Stijn Eyerman Lieven Eeckhout Ghent University, Belgium Abstract Detailed architectural simulators suffer
More informationRackspace Cloud Databases and Container-based Virtualization
Rackspace Cloud Databases and Container-based Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many
More informationEnsuring Quality of Service in High Performance Servers
Ensuring Quality of Service in High Performance Servers YAN SOLIHIN Fei Guo, Seongbeom Kim, Fang Liu Center of Efficient, Secure, and Reliable Computing (CESR) North Carolina State University solihin@ece.ncsu.edu
More informationMulti-core and Linux* Kernel
Multi-core and Linux* Kernel Suresh Siddha Intel Open Source Technology Center Abstract Semiconductor technological advances in the recent years have led to the inclusion of multiple CPU execution cores
More informationArchitectural Support for Enhanced SMT Job Scheduling
Architectural Support for Enhanced SMT Job Scheduling Alex Settle Joshua Kihm Andrew Janiszewski Dan Connors University of Colorado at Boulder Department of Electrical and Computer Engineering 5 UCB, Boulder,
More informationHP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads
HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads Gen9 Servers give more performance per dollar for your investment. Executive Summary Information Technology (IT) organizations face increasing
More informationImpact of Java Application Server Evolution on Computer System Performance
Impact of Java Application Server Evolution on Computer System Performance Peng-fei Chuang, Celal Ozturk, Khun Ban, Huijun Yan, Kingsum Chow, Resit Sendag Intel Corporation; {peng-fei.chuang, khun.ban,
More informationIn-network Monitoring and Control Policy for DVFS of CMP Networkson-Chip and Last Level Caches
In-network Monitoring and Control Policy for DVFS of CMP Networkson-Chip and Last Level Caches Xi Chen 1, Zheng Xu 1, Hyungjun Kim 1, Paul V. Gratz 1, Jiang Hu 1, Michael Kishinevsky 2 and Umit Ogras 2
More informationParallel Processing and Software Performance. Lukáš Marek
Parallel Processing and Software Performance Lukáš Marek DISTRIBUTED SYSTEMS RESEARCH GROUP http://dsrg.mff.cuni.cz CHARLES UNIVERSITY PRAGUE Faculty of Mathematics and Physics Benchmarking in parallel
More informationEnterprise Applications
Enterprise Applications Chi Ho Yue Sorav Bansal Shivnath Babu Amin Firoozshahian EE392C Emerging Applications Study Spring 2003 Functionality Online Transaction Processing (OLTP) Users/apps interacting
More informationMaximizing Hardware Prefetch Effectiveness with Machine Learning
Maximizing Hardware Prefetch Effectiveness with Machine Learning Saami Rahman, Martin Burtscher, Ziliang Zong, and Apan Qasem Department of Computer Science Texas State University San Marcos, TX 78666
More informationVirtualizing Performance Asymmetric Multi-core Systems
Virtualizing Performance Asymmetric Multi- Systems Youngjin Kwon, Changdae Kim, Seungryoul Maeng, and Jaehyuk Huh Computer Science Department, KAIST {yjkwon and cdkim}@calab.kaist.ac.kr, {maeng and jhhuh}@kaist.ac.kr
More informationDYNAMIC CACHE-USAGE PROFILER FOR THE XEN HYPERVISOR WIRA DAMIS MULIA. Bachelor of Science in Electrical and Computer. Engineering
DYNAMIC CACHE-USAGE PROFILER FOR THE XEN HYPERVISOR By WIRA DAMIS MULIA Bachelor of Science in Electrical and Computer Engineering Oklahoma State University Stillwater, Oklahoma 2009 Submitted to the Faculty
More informationEnergy-Efficient, High-Performance Heterogeneous Core Design
Energy-Efficient, High-Performance Heterogeneous Core Design Raj Parihar Core Design Session, MICRO - 2012 Advanced Computer Architecture Lab, UofR, Rochester April 18, 2013 Raj Parihar Energy-Efficient,
More informationCapstone Overview Architecture for Big Data & Machine Learning. Debbie Marr ICRI-CI 2015 Retreat, May 5, 2015
Capstone Overview Architecture for Big Data & Machine Learning Debbie Marr ICRI-CI 2015 Retreat, May 5, 2015 Accelerators Memory Traffic Reduction Memory Intensive Arch. Context-based Prefetching Deep
More informationCharacterizing Multi-threaded Applications for Designing Sharing-aware Last-level Cache Replacement Policies
Characterizing Multi-threaded Applications for Designing Sharing-aware Last-level Cache Replacement Policies Ragavendra Natarajan Department of Computer Science and Engineering University of Minnesota
More informationAllocation Policy Analysis for Cache Coherence Protocols for STT-MRAM-based caches
Allocation Policy Analysis for Cache Coherence Protocols for STT-MRAM-based caches A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Pushkar Shridhar Nandkar IN
More informationProcess-level Power Estimation in VM-based Systems
Process-level Power Estimation in VM-based Systems Maxime Colmant, Mascha Kurpicz, Pascal Felber, Loïc Huertas, Romain Rouvoy, Anita Sobe To cite this version: Maxime Colmant, Mascha Kurpicz, Pascal Felber,
More informationMicro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement
Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement Kshitij Sudan Niladrish Chatterjee David Nellans Manu Awasthi Rajeev Balasubramonian Al Davis School of Computing University of
More informationScheduling Algorithms for Effective Thread Pairing on Hybrid Multiprocessors
Scheduling Algorithms for Effective Thread Pairing on Hybrid Multiprocessors Robert L. McGregor Christos D. Antonopoulos Department of Computer Science The College of William & Mary Williamsburg, VA 23187-8795
More informationMulti-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
More informationHow To Improve Performance On A Multicore Processor With An Asymmetric Hypervisor
AASH: An Asymmetry-Aware Scheduler for Hypervisors Vahid Kazempour Ali Kamali Alexandra Fedorova Simon Fraser University, Vancouver, Canada {vahid kazempour, ali kamali, fedorova}@sfu.ca Abstract Asymmetric
More informationMeasuring the Performance of Prefetching Proxy Caches
Measuring the Performance of Prefetching Proxy Caches Brian D. Davison davison@cs.rutgers.edu Department of Computer Science Rutgers, The State University of New Jersey The Problem Traffic Growth User
More informationFACT: a Framework for Adaptive Contention-aware Thread migrations
FACT: a Framework for Adaptive Contention-aware Thread migrations Kishore Kumar Pusukuri Department of Computer Science and Engineering University of California, Riverside, CA 92507. kishore@cs.ucr.edu
More informationPower Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis
White Paper Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and IBM FlexSystem Enterprise Chassis White Paper March 2014 2014 Cisco and/or its affiliates. All rights reserved. This document
More information<Insert Picture Here> T4: A Highly Threaded Server-on-a-Chip with Native Support for Heterogeneous Computing
T4: A Highly Threaded Server-on-a-Chip with Native Support for Heterogeneous Computing Robert Golla Senior Hardware Architect Paul Jordan Senior Principal Hardware Engineer Oracle
More informationDATA centers often comprise thousands of enterprise
LeakageAware Cooling Management for Improving Server Energy Efficiency Marina Zapater, Ozan Tuncer, José L. Ayala, José M. Moya, Kalyan Vaidyanathan, Kenny Gross and Ayse K. Coskun Abstract The computational
More informationA Survey on ARM Cortex A Processors. Wei Wang Tanima Dey
A Survey on ARM Cortex A Processors Wei Wang Tanima Dey 1 Overview of ARM Processors Focusing on Cortex A9 & Cortex A15 ARM ships no processors but only IP cores For SoC integration Targeting markets:
More informationManaging Performance vs. Accuracy Trade-offs With Loop Perforation
Managing Performance vs. Accuracy Trade-offs With Loop Perforation Stelios Sidiroglou Sasa Misailovic Henry Hoffmann Martin Rinard Computer Science and Artificial Intelligence Laboratory Massachusetts
More informationVirtual Machine Scheduling for Parallel Soft Real-Time Applications
Virtual Machine Scheduling for Parallel Soft Real-Time Applications Like Zhou, Song Wu, Huahua Sun, Hai Jin, Xuanhua Shi Services Computing Technology and System Lab Cluster and Grid Computing Lab School
More informationDeciding which process to run. (Deciding which thread to run) Deciding how long the chosen process can run
SFWR ENG 3BB4 Software Design 3 Concurrent System Design 2 SFWR ENG 3BB4 Software Design 3 Concurrent System Design 11.8 10 CPU Scheduling Chapter 11 CPU Scheduling Policies Deciding which process to run
More informationADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM
ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-12: ARM 1 The ARM architecture processors popular in Mobile phone systems 2 ARM Features ARM has 32-bit architecture but supports 16 bit
More informationRUNAHEAD EXECUTION: AN EFFECTIVE ALTERNATIVE TO LARGE INSTRUCTION WINDOWS
RUNAHEAD EXECUTION: AN EFFECTIVE ALTERNATIVE TO LARGE INSTRUCTION WINDOWS AN INSTRUCTION WINDOW THAT CAN TOLERATE LATENCIES TO DRAM MEMORY IS PROHIBITIVELY COMPLEX AND POWER HUNGRY. TO AVOID HAVING TO
More informationAchieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
More informationA-DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters
A-DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters Hui Wang, Canturk Isci, Lavanya Subramanian, Jongmoo Choi, Depei Qian, Onur Mutlu Beihang University, IBM Thomas J. Watson
More informationPower Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure
White Paper Power Efficiency Comparison: Cisco UCS 5108 Blade Server Chassis and Dell PowerEdge M1000e Blade Enclosure White Paper March 2014 2014 Cisco and/or its affiliates. All rights reserved. This
More informationA Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures
11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the
More informationPOWER8 Performance Analysis
POWER8 Performance Analysis Satish Kumar Sadasivam Senior Performance Engineer, Master Inventor IBM Systems and Technology Labs satsadas@in.ibm.com #OpenPOWERSummit Join the conversation at #OpenPOWERSummit
More informationDelivering Quality in Software Performance and Scalability Testing
Delivering Quality in Software Performance and Scalability Testing Abstract Khun Ban, Robert Scott, Kingsum Chow, and Huijun Yan Software and Services Group, Intel Corporation {khun.ban, robert.l.scott,
More informationEvaluation Report: Accelerating SQL Server Database Performance with the Lenovo Storage S3200 SAN Array
Evaluation Report: Accelerating SQL Server Database Performance with the Lenovo Storage S3200 SAN Array Evaluation report prepared under contract with Lenovo Executive Summary Even with the price of flash
More informationHardware performance monitoring. Zoltán Majó
Hardware performance monitoring Zoltán Majó 1 Question Did you take any of these lectures: Computer Architecture and System Programming How to Write Fast Numerical Code Design of Parallel and High Performance
More informationOn Performance Debugging of Unnecessary Lock Contentions on Multicore Processors: A Replay-based Approach
On Performance Debugging of Unnecessary ock Contentions on Multicore Processors: A Replay-based Approach ong Zheng Xiaofei iao Services Computing Technology and System ab, Cluster and Grid Computing ab,
More informationOracle Database Scalability in VMware ESX VMware ESX 3.5
Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises
More informationDisk Storage Shortfall
Understanding the root cause of the I/O bottleneck November 2010 2 Introduction Many data centers have performance bottlenecks that impact application performance and service delivery to users. These bottlenecks
More informationPrecise and Accurate Processor Simulation
Precise and Accurate Processor Simulation Harold Cain, Kevin Lepak, Brandon Schwartz, and Mikko H. Lipasti University of Wisconsin Madison http://www.ece.wisc.edu/~pharm Performance Modeling Analytical
More informationIntroducing EEMBC Cloud and Big Data Server Benchmarks
Introducing EEMBC Cloud and Big Data Server Benchmarks Quick Background: Industry-Standard Benchmarks for the Embedded Industry EEMBC formed in 1997 as non-profit consortium Defining and developing application-specific
More informationPART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General
More informationOutline. Introduction. State-of-the-art Forensic Methods. Hardware-based Workload Forensics. Experimental Results. Summary. OS level Hypervisor level
Outline Introduction State-of-the-art Forensic Methods OS level Hypervisor level Hardware-based Workload Forensics Process Reconstruction Experimental Results Setup Result & Overhead Summary 1 Introduction
More informationApplication Performance Analysis of the Cortex-A9 MPCore
This project in ARM is in part funded by ICT-eMuCo, a European project supported under the Seventh Framework Programme (7FP) for research and technological development Application Performance Analysis
More informationMAGENTO HOSTING Progressive Server Performance Improvements
MAGENTO HOSTING Progressive Server Performance Improvements Simple Helix, LLC 4092 Memorial Parkway Ste 202 Huntsville, AL 35802 sales@simplehelix.com 1.866.963.0424 www.simplehelix.com 2 Table of Contents
More informationMore on Pipelining and Pipelines in Real Machines CS 333 Fall 2006 Main Ideas Data Hazards RAW WAR WAW More pipeline stall reduction techniques Branch prediction» static» dynamic bimodal branch prediction
More informationHost Power Management in VMware vsphere 5
in VMware vsphere 5 Performance Study TECHNICAL WHITE PAPER Table of Contents Introduction.... 3 Power Management BIOS Settings.... 3 Host Power Management in ESXi 5.... 4 HPM Power Policy Options in ESXi
More informationIntel Pentium 4 Processor on 90nm Technology
Intel Pentium 4 Processor on 90nm Technology Ronak Singhal August 24, 2004 Hot Chips 16 1 1 Agenda Netburst Microarchitecture Review Microarchitecture Features Hyper-Threading Technology SSE3 Intel Extended
More informationAn OS-oriented performance monitoring tool for multicore systems
An OS-oriented performance monitoring tool for multicore systems J.C. Sáez, J. Casas, A. Serrano, R. Rodríguez-Rodríguez, F. Castro, D. Chaver, M. Prieto-Matias Department of Computer Architecture Complutense
More informationThis Unit: Multithreading (MT) CIS 501 Computer Architecture. Performance And Utilization. Readings
This Unit: Multithreading (MT) CIS 501 Computer Architecture Unit 10: Hardware Multithreading Application OS Compiler Firmware CU I/O Memory Digital Circuits Gates & Transistors Why multithreading (MT)?
More informationLecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.
Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide
More informationArchitecture of Hitachi SR-8000
Architecture of Hitachi SR-8000 University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Slide 1 Most of the slides from Hitachi Slide 2 the problem modern computer are data
More informationTRACE PERFORMANCE TESTING APPROACH. Overview. Approach. Flow. Attributes
TRACE PERFORMANCE TESTING APPROACH Overview Approach Flow Attributes INTRODUCTION Software Testing Testing is not just finding out the defects. Testing is not just seeing the requirements are satisfied.
More informationAnalyse de performances pour les systèmes intégrés multi-cœurs
Ben Salma SANA Laboratoire TIMA SLS Analyse de performances pour les systèmes intégrés multi-cœurs Encadrants: Frederic Petrot (Frederic.Petrot@imag.fr) Nicolas Fournel (Nicolas.Fournel@imag.fr) Page 1
More informationData Sharing or Resource Contention: Toward Performance Transparency on Multicore Systems
Data Sharing or Resource Contention: Toward Performance Transparency on Multicore Systems Sharanyan Srikanthan, Sandhya Dwarkadas, and Kai Shen, University of Rochester https://www.usenix.org/conference/atc5/technical-session/presentation/srikanthan
More informationIdentifying the Optimal Energy-Efficient Operating Points of Parallel Workloads
Identifying the Optimal Energy-Efficient Operating Points of Parallel Workloads Ryan Cochran School of Engineering Brown University Providence, RI 2912 ryan_cochran@brown.edu Can Hankendi ECE Department
More informationComputer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.
Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University cwliu@twins.ee.nctu.edu.tw Review Computers in mid 50 s Hardware was expensive
More informationParallel Computing 37 (2011) 26 41. Contents lists available at ScienceDirect. Parallel Computing. journal homepage: www.elsevier.
Parallel Computing 37 (2011) 26 41 Contents lists available at ScienceDirect Parallel Computing journal homepage: www.elsevier.com/locate/parco Architectural support for thread communications in multi-core
More informationAccurate Characterization of the Variability in Power Consumption in Modern Mobile Processors
Accurate Characterization of the Variability in Power Consumption in Modern Mobile Processors Bharathan Balaji, John McCullough, Rajesh K. Gupta, Yuvraj Agarwal University of California, San Diego {bbalaji,
More informationMONITORING power consumption of a microprocessor
IEEE TRANSACTIONS ON CIRCUIT AND SYSTEMS-II, VOL. X, NO. Y, JANUARY XXXX 1 A Study on the use of Performance Counters to Estimate Power in Microprocessors Rance Rodrigues, Member, IEEE, Arunachalam Annamalai,
More informationPost-compiler Software Optimization for Reducing Energy
Post-compiler Software Optimization for Reducing Energy Eric Schulte Jonathan Dorn Stephen Harding Stephanie Forrest Westley Weimer Department of Computer Science Department of Computer Science University
More informationRun-time Resource Management in SOA Virtualized Environments. Danilo Ardagna, Raffaela Mirandola, Marco Trubian, Li Zhang
Run-time Resource Management in SOA Virtualized Environments Danilo Ardagna, Raffaela Mirandola, Marco Trubian, Li Zhang Amsterdam, August 25 2009 SOI Run-time Management 2 SOI=SOA + virtualization Goal:
More informationLecture 3: Evaluating Computer Architectures. Software & Hardware: The Virtuous Cycle?
Lecture 3: Evaluating Computer Architectures Announcements - Reminder: Homework 1 due Thursday 2/2 Last Time technology back ground Computer elements Circuits and timing Virtuous cycle of the past and
More informationProbabilistic Modeling for Job Symbiosis Scheduling on SMT Processors
7 Probabilistic Modeling for Job Symbiosis Scheduling on SMT Processors STIJN EYERMAN and LIEVEN EECKHOUT, Ghent University, Belgium Symbiotic job scheduling improves simultaneous multithreading (SMT)
More informationAchieving QoS in Server Virtualization
Achieving QoS in Server Virtualization Intel Platform Shared Resource Monitoring/Control in Xen Chao Peng (chao.p.peng@intel.com) 1 Increasing QoS demand in Server Virtualization Data center & Cloud infrastructure
More informationExploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor. Travis Lanier Senior Product Manager
Exploring the Design of the Cortex-A15 Processor ARM s next generation mobile applications processor Travis Lanier Senior Product Manager 1 Cortex-A15: Next Generation Leadership Cortex-A class multi-processor
More informationLOOKING FOR AN AMAZING PROCESSOR. Product Brief 6th Gen Intel Core Processors for Desktops: S-series
Product Brief 6th Gen Intel Core Processors for Desktops: Sseries LOOKING FOR AN AMAZING PROCESSOR for your next desktop PC? Look no further than 6th Gen Intel Core processors. With amazing performance
More informationVirtualization Performance Insights from TPC-VMS
Virtualization Performance Insights from TPC-VMS Wayne D. Smith, Shiny Sebastian Intel Corporation wayne.smith@intel.com shiny.sebastian@intel.com Abstract. This paper describes the TPC-VMS (Virtual Measurement
More informationCloudCache: Expanding and Shrinking Private Caches
Credits CloudCache: Expanding and Shrinking Private Caches Parts of the work presented in this talk are from the results obtained in collaboration with students and faculty at the : Mohammad Hammoud Lei
More informationMeasuring Cache and Memory Latency and CPU to Memory Bandwidth
White Paper Joshua Ruggiero Computer Systems Engineer Intel Corporation Measuring Cache and Memory Latency and CPU to Memory Bandwidth For use with Intel Architecture December 2008 1 321074 Executive Summary
More informationSymmetric Multiprocessing
Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called
More informationUnderstanding the Impact of Inter-Thread Cache Interference on ILP in Modern SMT Processors
Journal of Instruction-Level Parallelism 7 (25) 1-28 Submitted 2/25; published 6/25 Understanding the Impact of Inter-Thread Cache Interference on ILP in Modern SMT Processors Joshua Kihm Alex Settle Andrew
More informationFAST, ACCURATE, AND VALIDATED FULL-SYSTEM SOFTWARE SIMULATION
... FAST, ACCURATE, AND VALIDATED FULL-SYSTEM SOFTWARE SIMULATION OF X86 HARDWARE... THIS ARTICLE PRESENTS A FAST AND ACCURATE INTERVAL-BASED CPU TIMING MODEL THAT IS EASILY IMPLEMENTED AND INTEGRATED
More information