How To Monitor Performance On A Microsoft Powerbook (Powerbook) On A Network (Powerbus) On An Uniden (Powergen) With A Microsatellite) On The Microsonde (Powerstation) On Your Computer (Power
|
|
|
- Nigel Melton
- 5 years ago
- Views:
Transcription
1 A Topology-Aware Performance Monitoring Tool for Shared Resource Management in Multicore Systems TADaaM Team - Nicolas Denoyelle - Brice Goglin - Emmanuel Jeannot August 24, 2015
2 1. Context/Motivations 2. Fast presentation of the tool 3. Demonstration 4. How does it works? 5. How is it made? 6. Features & Future Works TADaaM Team - Nicolas Denoyelle - Brice Goglin - Emmanuel Jeannot August 24, 2015
3 MOTIVATIONS Memory hierarchy is growing deeper and larger. No performance without a fair usage of the system topology Batch schedulers, runtimes, applications themeselves... are getting topology aware. Processing Unit NUMA Shared Memory Shared Cache IO Network Machine NUMA Shared Memory Shared Cache Topology Aware Performance Monitoring August 24,
4 MOTIVATIONS Memory hierarchy is growing deeper and larger. IO Network Hence, data management gives opportunities for performance improvements. NUMA Shared Memory Shared Cache Machine NUMA Shared Memory Shared Cache Processing Unit Topology Aware Performance Monitoring August 24,
5 MOTIVATIONS Memory hierarchy is growing deeper and larger. IO Network Hence, data management gives opportunities for performance improvements. NUMA Shared Memory Shared Cache Machine NUMA Shared Memory Shared Cache Processing Unit Topology Aware Performance Monitoring August 24,
6 MOTIVATIONS Memory hierarchy is growing deeper and larger. IO Network Hence, data management gives opportunities for performance improvements. NUMA Shared Memory Shared Cache Machine NUMA Shared Memory Shared Cache It is a multi-level and a multi-criteria problem. Processing Unit Topology Aware Performance Monitoring August 24,
7 MOTIVATIONS Need to match use cases, and relevant performance metrics for each level. Need to match performance and topology. Requires topology modeling skills. Requires adaptable performance monitoring. Topology Aware Performance Monitoring August 24,
8 Yet Another Tool to Monitor Applications Performance Focus on data presentation to link the results with topology informations. Relies on two cornerstones of topology modeling (hwloc) and performance counter abstraction (PAPI) to map the latter on the former Minimal configuration and software requirements. Can help finding and caracterizing localized bottlenecks. Topology Aware Performance Monitoring August 24,
9 Hardware Locality (hwloc) Portable abstraction of hierarchical architectures for high-performance computing Performs topology discovery and extracts hardware component information. Provides tools for memory and process binding. Many operating systems supported... lstopo utility to display the topology: Developped at Inria Bordeaux. Topology Aware Performance Monitoring August 24,
10 Hardware Locality (hwloc) Machine (31GB total) NUMANode P#0 (31GB) Package P#0 L3 (20MB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L2 (256KB) L1d L1d L1d L1d L1d L1d L1d L1d L1i L1i L1i L1i L1i L1i L1i L1i P#0 P#1 P#2 P#3 P#4 P#5 P#6 P#7 P#0 P#2 P#4 P#6 P#8 P#10 P#12 P#14 P#16 P#18 P#20 P#22 P#24 P#26 P#28 P#30 Topology Aware Performance Monitoring August 24,
11 Performance Application Programming Interface (PAPI) Consistent interface and methodology for use of the performance counter hardware. Real time relation between software performance and processor events. Many operating systems supported too. Reliable and actively supported. Used in a wide range of performance analysis applications. An abstraction layer to plug some other performance library is under development. Topology Aware Performance Monitoring August 24,
12 Dynamic Lstopo (example) Machine (16GB total) NUMANode P#0 (16GB) Package P#0 L3 9, e+01 (20MB) L2 2, e+02 (256KB) L2 2, e+02 (256KB) L2 2, e+02 (256KB) L2 2, e+02 (256KB) L2 2, e+02 (256KB) L2 5, e+02 (256KB) L2 2, e+02 (256KB) L2 3, e+02 (256KB) L1d 6, e+02 L1d 6, e+02 L1d 6, e+02 L1d 6, e+02 L1d 7, e+02 L1d 1, e+03 L1d 6, e+02 L1d 6, e+02 L1i 1, e+03 L1i 1, e+03 L1i 1, e+03 L1i 1, e+03 L1i 1, e+03 L1i 2, e+03 L1i 1, e+03 L1i 1, e+03 P#0 P#1 P#2 P#3 P#4 P#5 P#6 P#7 2,82176e+00 P#0 2,45916e+00 P#2 2,68772e+00 P#4 1,51689e+00 P#6 1,63417e+00 P#8 3,47249e+00 P#10 1,47808e+00 P#12 1,51514e+00 P#14 1,40868e+00 P#16 1,47441e+00 P#18 1,52142e+00 P#20 2,64271e+00 P#22 2,82472e+00 P#24 1,47165e+00 P#26 2,58947e+00 P#28 2,49344e+00 P#30 Sample of hardware performance counters mapped on a single socket of an Intel Xeon E C. Topology Aware Performance Monitoring August 24,
13 A Demonstration Worth Thousand Words L1 L2 L Accesses to a linked list of variable size. Topology Aware Performance Monitoring August 24,
14 A Demonstration Worth Thousand Words L1 L2 L Accesses to a linked list of variable size. Topology Aware Performance Monitoring August 24,
15 A Demonstration Worth Thousand Words L1 L2 L Accesses to a linked list of variable size. Topology Aware Performance Monitoring August 24,
16 A Demonstration Worth Thousand Words L3_MISS{ OBJ = L3; CTR = PAPI_L3_TCM; LOGSCALE = 1; } L2_MISS{ OBJ = L2; CTR = PAPI_L2_TCM; LOGSCALE = 1; } L1_MISS{ OBJ = L1d; CTR = PAPI_L1_DCM; LOGSCALE = 1; } SINGLE_L3_MISS{ OBJ = ; CTR = PAPI_L3_TCM; LOGSCALE = 1; } Topology Aware Performance Monitoring August 24,
17 Dynamic Lstopo (Usage) Machine (16GB total) Counters input: NUMANode P#0 (16GB) Package P#0 L3 2, e+03 (4096KB) L2 (256KB) L1d L1i L2 (256KB) L1d L1i SINGLE L3 MISS{ OBJ = L3 ; CTR = PAPI L2 TCM/PAPI L2 TCA ; LOGSCALE = 1 ; MAX= ; MIN=0; } P#0 P#1 P#0 P#1 P#2 P#3 Command line: lstopo perf-input counters input Topology Aware Performance Monitoring August 24,
18 Dynamic Lstopo (Theory) 1. Spawn one pthread per hardware thread (#0,..., #3). L2 L1 + L1 + + Topology Aware Performance Monitoring August 24,
19 Dynamic Lstopo (Theory) 1. Spawn one pthread per hardware thread (#0,..., #3). L2 2. For each timestamp, each thread collects a local set of performance counters. L1 + L1 + + Topology Aware Performance Monitoring August 24,
20 Dynamic Lstopo (Theory) 1. Spawn one pthread per hardware thread (#0,..., #3). L2 2. For each timestamp, each thread collects a local set of performance counters. L1 + L1 3. Counters are accumulated in each upper level. + + Topology Aware Performance Monitoring August 24,
21 Dynamic Lstopo (Theory) 1. Spawn one pthread per hardware thread (#0,..., #3). L2 % 2. For each timestamp, each thread collects a local set of performance counters. % L1 + L1 % 3. Counters are accumulated in each upper level. % + % % + % 4. For each level, a leaf computes an arithmetic expression of the performance counters in the set. Topology Aware Performance Monitoring August 24,
22 Dynamic Lstopo Software Architecture in Brief lstopo utility Monitors output Application monitors library hwloc library PAPI library Machine static topology Machine dynamic performance counters Topology Aware Performance Monitoring August 24,
23 Dynamic Lstopo Software Architecture in Brief lstopo utility Monitors output Application monitors library hwloc library PAPI library Machine static topology Machine dynamic performance counters Topology Aware Performance Monitoring August 24,
24 API m o n i t o r s = l o a d M o n i t o r s f r o m c o n f i g (NULL, m y p e r f f i l e, m y o u t p u t f i l e, 0) ; M o n i t o r s w a t c h p i d ( monitors, g e t p i d ( ) ) ; M o n i t o r s s t a r t ( m o n i t o r s ) ; /... / M o n i t o r s u p d a t e c o u n t e r s ( m o n i t o r s ) ; d e l e t e M o n i t o r s ( m o n i t o r s ) ; Topology Aware Performance Monitoring August 24,
25 Dynamic Lstopo (Output paje trace) % EventDef val 0 % Id int % Phase int % Time_us date % Value double % EndEventDef % EventDef container 1 % Id int % Level string % Sibling int % Name string % Logscale int % EndEventDef SINGLE_ L3_ MISS SINGLE_ L3_ MISS SINGLE_ L3_ MISS SINGLE_ L3_ MISS , , , , , , , ,00 Topology Aware Performance Monitoring August 24,
26 Features Record and/or Display live machine performance counters and match them with topology. Several settings: counters accumulation, sampling rate, attach to a process... Replay any trace with a topology file (for external display).. Sample specific parts of an application with the monitor library. Support legacy lstopo options (restrict topology, change display format... ). Topology Aware Performance Monitoring August 24,
27 Future works Match code and performance informations Accept user defined aggregation operator. Provide performance abstraction layer. Be able to delimit phases during execution. Find and give explicit hints on bottlenecks. Topology Aware Performance Monitoring August 24,
28 Conclusion Data locality becomes a main criterion for high performance. We built a tool based on a topology model and a performance library to help taking up the challenge. It maps performance values to machine objects. It is a visual tool, fast and easy to use. It is lightweight and causes less than 1% C overhead. Let you build topology aware performance models. Dynamic lstopo is into the process of beeing merged with hwloc project. Topology Aware Performance Monitoring August 24,
29 Now available from lstopo Thank you Topology Aware Performance Monitoring August 24,
Distributed communication-aware load balancing with TreeMatch in Charm++
Distributed communication-aware load balancing with TreeMatch in Charm++ The 9th Scheduling for Large Scale Systems Workshop, Lyon, France Emmanuel Jeannot Guillaume Mercier Francois Tessier In collaboration
Multi-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
MAQAO Performance Analysis and Optimization Tool
MAQAO Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL [email protected] Performance Evaluation Team, University of Versailles S-Q-Y http://www.maqao.org VI-HPS 18 th Grenoble 18/22
ELEC 377. Operating Systems. Week 1 Class 3
Operating Systems Week 1 Class 3 Last Class! Computer System Structure, Controllers! Interrupts & Traps! I/O structure and device queues.! Storage Structure & Caching! Hardware Protection! Dual Mode Operation
Performance Analysis and Optimization Tool
Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL [email protected] Performance Analysis Team, University of Versailles http://www.maqao.org Introduction Performance Analysis Develop
Performance Characteristics of Large SMP Machines
Performance Characteristics of Large SMP Machines Dirk Schmidl, Dieter an Mey, Matthias S. Müller [email protected] Rechen- und Kommunikationszentrum (RZ) Agenda Investigated Hardware Kernel Benchmark
Performance Monitoring of Parallel Scientific Applications
Performance Monitoring of Parallel Scientific Applications Abstract. David Skinner National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory This paper introduces an infrastructure
NVIDIA Tools For Profiling And Monitoring. David Goodwin
NVIDIA Tools For Profiling And Monitoring David Goodwin Outline CUDA Profiling and Monitoring Libraries Tools Technologies Directions CScADS Summer 2012 Workshop on Performance Tools for Extreme Scale
Performance Application Programming Interface
/************************************************************************************ ** Notes on Performance Application Programming Interface ** ** Intended audience: Those who would like to learn more
Measuring Cache and Memory Latency and CPU to Memory Bandwidth
White Paper Joshua Ruggiero Computer Systems Engineer Intel Corporation Measuring Cache and Memory Latency and CPU to Memory Bandwidth For use with Intel Architecture December 2008 1 321074 Executive Summary
End-user Tools for Application Performance Analysis Using Hardware Counters
1 End-user Tools for Application Performance Analysis Using Hardware Counters K. London, J. Dongarra, S. Moore, P. Mucci, K. Seymour, T. Spencer Abstract One purpose of the end-user tools described in
Optimization tools. 1) Improving Overall I/O
Optimization tools After your code is compiled, debugged, and capable of running to completion or planned termination, you can begin looking for ways in which to improve execution speed. In general, the
Performance Counter. Non-Uniform Memory Access Seminar Karsten Tausche 2014-12-10
Performance Counter Non-Uniform Memory Access Seminar Karsten Tausche 2014-12-10 Performance Counter Hardware Unit for event measurements Performance Monitoring Unit (PMU) Originally for CPU-Debugging
Multi-core Programming System Overview
Multi-core Programming System Overview Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,
Multi-core and Linux* Kernel
Multi-core and Linux* Kernel Suresh Siddha Intel Open Source Technology Center Abstract Semiconductor technological advances in the recent years have led to the inclusion of multiple CPU execution cores
The Lagopus SDN Software Switch. 3.1 SDN and OpenFlow. 3. Cloud Computing Technology
3. The Lagopus SDN Software Switch Here we explain the capabilities of the new Lagopus software switch in detail, starting with the basics of SDN and OpenFlow. 3.1 SDN and OpenFlow Those engaged in network-related
OpenMP Programming on ScaleMP
OpenMP Programming on ScaleMP Dirk Schmidl [email protected] Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign
VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5
Performance Study VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5 VMware VirtualCenter uses a database to store metadata on the state of a VMware Infrastructure environment.
System Requirements Table of contents
Table of contents 1 Introduction... 2 2 Knoa Agent... 2 2.1 System Requirements...2 2.2 Environment Requirements...4 3 Knoa Server Architecture...4 3.1 Knoa Server Components... 4 3.2 Server Hardware Setup...5
The Impact of Memory Subsystem Resource Sharing on Datacenter Applications. Lingia Tang Jason Mars Neil Vachharajani Robert Hundt Mary Lou Soffa
The Impact of Memory Subsystem Resource Sharing on Datacenter Applications Lingia Tang Jason Mars Neil Vachharajani Robert Hundt Mary Lou Soffa Introduction Problem Recent studies into the effects of memory
Delivering Quality in Software Performance and Scalability Testing
Delivering Quality in Software Performance and Scalability Testing Abstract Khun Ban, Robert Scott, Kingsum Chow, and Huijun Yan Software and Services Group, Intel Corporation {khun.ban, robert.l.scott,
Resource Aware Scheduler for Storm. Software Design Document. <[email protected]> Date: 09/18/2015
Resource Aware Scheduler for Storm Software Design Document Author: Boyang Jerry Peng Date: 09/18/2015 Table of Contents 1. INTRODUCTION 3 1.1. USING
PAPI - PERFORMANCE API. ANDRÉ PEREIRA [email protected]
1 PAPI - PERFORMANCE API ANDRÉ PEREIRA [email protected] 2 Motivation Application and functions execution time is easy to measure time gprof valgrind (callgrind) It is enough to identify bottlenecks,
Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1
Performance Study Performance Characteristics of and RDM VMware ESX Server 3.0.1 VMware ESX Server offers three choices for managing disk access in a virtual machine VMware Virtual Machine File System
Understanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
Enabling Technologies for Distributed Computing
Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF Multi-core CPUs and Multithreading Technologies
Binary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
Exascale Challenges and General Purpose Processors. Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation
Exascale Challenges and General Purpose Processors Avinash Sodani, Ph.D. Chief Architect, Knights Landing Processor Intel Corporation Jun-93 Aug-94 Oct-95 Dec-96 Feb-98 Apr-99 Jun-00 Aug-01 Oct-02 Dec-03
Scalability evaluation of barrier algorithms for OpenMP
Scalability evaluation of barrier algorithms for OpenMP Ramachandra Nanjegowda, Oscar Hernandez, Barbara Chapman and Haoqiang H. Jin High Performance Computing and Tools Group (HPCTools) Computer Science
Enabling Technologies for Distributed and Cloud Computing
Enabling Technologies for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Multi-core CPUs and Multithreading
Very Large Enterprise Network, Deployment, 25000+ Users
Very Large Enterprise Network, Deployment, 25000+ Users Websense software can be deployed in different configurations, depending on the size and characteristics of the network, and the organization s filtering
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers Happiest People. Happiest Customers Contents Abstract... 3 Introduction... 3 Performance Comparison... 4 Architecture... 5 Diagram...
Release Notes for Open Grid Scheduler/Grid Engine. Version: Grid Engine 2011.11
Release Notes for Open Grid Scheduler/Grid Engine Version: Grid Engine 2011.11 New Features Berkeley DB Spooling Directory Can Be Located on NFS The Berkeley DB spooling framework has been enhanced such
<Insert Picture Here> An Experimental Model to Analyze OpenMP Applications for System Utilization
An Experimental Model to Analyze OpenMP Applications for System Utilization Mark Woodyard Principal Software Engineer 1 The following is an overview of a research project. It is intended
OpenMP and Performance
Dirk Schmidl IT Center, RWTH Aachen University Member of the HPC Group [email protected] IT Center der RWTH Aachen University Tuning Cycle Performance Tuning aims to improve the runtime of an
CSCI E 98: Managed Environments for the Execution of Programs
CSCI E 98: Managed Environments for the Execution of Programs Draft Syllabus Instructor Phil McGachey, PhD Class Time: Mondays beginning Sept. 8, 5:30-7:30 pm Location: 1 Story Street, Room 304. Office
ScaAnalyzer: A Tool to Identify Memory Scalability Bottlenecks in Parallel Programs
ScaAnalyzer: A Tool to Identify Memory Scalability Bottlenecks in Parallel Programs ABSTRACT Xu Liu Department of Computer Science College of William and Mary Williamsburg, VA 23185 [email protected] It is
JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert
Mitglied der Helmholtz-Gemeinschaft JUROPA Linux Cluster An Overview 19 May 2014 Ulrich Detert JuRoPA JuRoPA Jülich Research on Petaflop Architectures Bull, Sun, ParTec, Intel, Mellanox, Novell, FZJ JUROPA
LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011
SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,
Practical Performance Understanding the Performance of Your Application
Neil Masson IBM Java Service Technical Lead 25 th September 2012 Practical Performance Understanding the Performance of Your Application 1 WebSphere User Group: Practical Performance Understand the Performance
The team that wrote this redbook Comments welcome Introduction p. 1 Three phases p. 1 Netfinity Performance Lab p. 2 IBM Center for Microsoft
Foreword p. xv Preface p. xvii The team that wrote this redbook p. xviii Comments welcome p. xx Introduction p. 1 Three phases p. 1 Netfinity Performance Lab p. 2 IBM Center for Microsoft Technologies
Microsoft Windows Server 2003 with Internet Information Services (IIS) 6.0 vs. Linux Competitive Web Server Performance Comparison
April 23 11 Aviation Parkway, Suite 4 Morrisville, NC 2756 919-38-28 Fax 919-38-2899 32 B Lakeside Drive Foster City, CA 9444 65-513-8 Fax 65-513-899 www.veritest.com [email protected] Microsoft Windows
D5.6 Prototype demonstration of performance monitoring tools on a system with multiple ARM boards Version 1.0
D5.6 Prototype demonstration of performance monitoring tools on a system with multiple ARM boards Document Information Contract Number 288777 Project Website www.montblanc-project.eu Contractual Deadline
Understand Performance Monitoring
Understand Performance Monitoring Lesson Overview In this lesson, you will learn: Performance monitoring methods Monitor specific system activities Create a Data Collector Set View diagnosis reports Task
SIDN Server Measurements
SIDN Server Measurements Yuri Schaeffer 1, NLnet Labs NLnet Labs document 2010-003 July 19, 2010 1 Introduction For future capacity planning SIDN would like to have an insight on the required resources
Database Application Developer Tools Using Static Analysis and Dynamic Profiling
Database Application Developer Tools Using Static Analysis and Dynamic Profiling Surajit Chaudhuri, Vivek Narasayya, Manoj Syamala Microsoft Research {surajitc,viveknar,manojsy}@microsoft.com Abstract
Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand
Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. W. Jin D. K. Panda Network Based
Optimizing the Performance of Your Longview Application
Optimizing the Performance of Your Longview Application François Lalonde, Director Application Support May 15, 2013 Disclaimer This presentation is provided to you solely for information purposes, is not
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
Oracle Database Scalability in VMware ESX VMware ESX 3.5
Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises
IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Hyper-V Server Agent Version 6.3.1 Fix Pack 2.
IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Hyper-V Server Agent Version 6.3.1 Fix Pack 2 Reference IBM Tivoli Composite Application Manager for Microsoft Applications:
HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief
Technical white paper HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief Scale-up your Microsoft SQL Server environment to new heights Table of contents Executive summary... 2 Introduction...
Implementing Probes for J2EE Cluster Monitoring
Implementing s for J2EE Cluster Monitoring Emmanuel Cecchet, Hazem Elmeleegy, Oussama Layaida, Vivien Quéma LSR-IMAG Laboratory (CNRS, INPG, UJF) - INRIA INRIA Rhône-Alpes, 655 av. de l Europe, 38334 Saint-Ismier
Data Structure Oriented Monitoring for OpenMP Programs
A Data Structure Oriented Monitoring Environment for Fortran OpenMP Programs Edmond Kereku, Tianchao Li, Michael Gerndt, and Josef Weidendorfer Institut für Informatik, Technische Universität München,
MAGENTO HOSTING Progressive Server Performance Improvements
MAGENTO HOSTING Progressive Server Performance Improvements Simple Helix, LLC 4092 Memorial Parkway Ste 202 Huntsville, AL 35802 [email protected] 1.866.963.0424 www.simplehelix.com 2 Table of Contents
How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
L20: GPU Architecture and Models
L20: GPU Architecture and Models scribe(s): Abdul Khalifa 20.1 Overview GPUs (Graphics Processing Units) are large parallel structure of processing cores capable of rendering graphics efficiently on displays.
Scheduling Task Parallelism" on Multi-Socket Multicore Systems"
Scheduling Task Parallelism" on Multi-Socket Multicore Systems" Stephen Olivier, UNC Chapel Hill Allan Porterfield, RENCI Kyle Wheeler, Sandia National Labs Jan Prins, UNC Chapel Hill Outline" Introduction
Operating Systems. 05. Threads. Paul Krzyzanowski. Rutgers University. Spring 2015
Operating Systems 05. Threads Paul Krzyzanowski Rutgers University Spring 2015 February 9, 2015 2014-2015 Paul Krzyzanowski 1 Thread of execution Single sequence of instructions Pointed to by the program
11.1 inspectit. 11.1. inspectit
11.1. inspectit Figure 11.1. Overview on the inspectit components [Siegl and Bouillet 2011] 11.1 inspectit The inspectit monitoring tool (website: http://www.inspectit.eu/) has been developed by NovaTec.
x64 Servers: Do you want 64 or 32 bit apps with that server?
TMurgent Technologies x64 Servers: Do you want 64 or 32 bit apps with that server? White Paper by Tim Mangan TMurgent Technologies February, 2006 Introduction New servers based on what is generally called
Chapter 14 Virtual Machines
Operating Systems: Internals and Design Principles Chapter 14 Virtual Machines Eighth Edition By William Stallings Virtual Machines (VM) Virtualization technology enables a single PC or server to simultaneously
Full and Para Virtualization
Full and Para Virtualization Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF x86 Hardware Virtualization The x86 architecture offers four levels
Tableau Server 7.0 scalability
Tableau Server 7.0 scalability February 2012 p2 Executive summary In January 2012, we performed scalability tests on Tableau Server to help our customers plan for large deployments. We tested three different
Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services. By Ajay Goyal Consultant Scalability Experts, Inc.
Enterprise Performance Tuning: Best Practices with SQL Server 2008 Analysis Services By Ajay Goyal Consultant Scalability Experts, Inc. June 2009 Recommendations presented in this document should be thoroughly
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Josef Pelikán Charles University in Prague, KSVI Department, [email protected] Abstract 1 Interconnect quality
Building an energy dashboard. Energy measurement and visualization in current HPC systems
Building an energy dashboard Energy measurement and visualization in current HPC systems Thomas Geenen 1/58 [email protected] SURFsara The Dutch national HPC center 2H 2014 > 1PFlop GPGPU accelerators
Analysis of VDI Storage Performance During Bootstorm
Analysis of VDI Storage Performance During Bootstorm Introduction Virtual desktops are gaining popularity as a more cost effective and more easily serviceable solution. The most resource-dependent process
A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures
11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the
Oracle Database Reliability, Performance and scalability on Intel Xeon platforms Mitch Shults, Intel Corporation October 2011
Oracle Database Reliability, Performance and scalability on Intel platforms Mitch Shults, Intel Corporation October 2011 1 Intel Processor E7-8800/4800/2800 Product Families Up to 10 s and 20 Threads 30MB
Cloud Computing Simulation Using CloudSim
Cloud Computing Simulation Using CloudSim Ranjan Kumar #1, G.Sahoo *2 # Assistant Professor, Computer Science & Engineering, Ranchi University, India Professor & Head, Information Technology, Birla Institute
Scalability Factors of JMeter In Performance Testing Projects
Scalability Factors of JMeter In Performance Testing Projects Title Scalability Factors for JMeter In Performance Testing Projects Conference STEP-IN Conference Performance Testing 2008, PUNE Author(s)
Oracle Enterprise Manager 12c New Capabilities for the DBA. Charlie Garry, Director, Product Management Oracle Server Technologies
Oracle Enterprise Manager 12c New Capabilities for the DBA Charlie Garry, Director, Product Management Oracle Server Technologies of DBAs admit doing nothing to address performance issues CHANGE AVOID
SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri
SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri Our Contributions SWARM: SoftWare and Algorithms for Running on Multicore, a portable
Using Library Dependencies for Clustering
Using Library Dependencies for Clustering Jochen Quante Software Engineering Group, FB03 Informatik, Universität Bremen [email protected] Abstract: Software clustering is an established approach
COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING
COMPUTER ORGANIZATION ARCHITECTURES FOR EMBEDDED COMPUTING 2013/2014 1 st Semester Sample Exam January 2014 Duration: 2h00 - No extra material allowed. This includes notes, scratch paper, calculator, etc.
Xeon+FPGA Platform for the Data Center
Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system
Informatica Ultra Messaging SMX Shared-Memory Transport
White Paper Informatica Ultra Messaging SMX Shared-Memory Transport Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance This document contains Confidential, Proprietary and Trade
Very Large Enterprise Network Deployment, 25,000+ Users
Very Large Enterprise Network Deployment, 25,000+ Users Websense software can be deployed in different configurations, depending on the size and characteristics of the network, and the organization s filtering
Lab Validation Report
Lab Validation Report Workload Performance Analysis: Microsoft Windows Server 2012 with Hyper-V and SQL Server 2012 Virtualizing Tier-1 Application Workloads with Confidence By Mike Leone, ESG Lab Engineer
HPSA Agent Characterization
HPSA Agent Characterization Product HP Server Automation (SA) Functional Area Managed Server Agent Release 9.0 Page 1 HPSA Agent Characterization Quick Links High-Level Agent Characterization Summary...
Microsoft SQL Server OLTP Best Practice
Microsoft SQL Server OLTP Best Practice The document Introduction to Transactional (OLTP) Load Testing for all Databases provides a general overview on the HammerDB OLTP workload and the document Microsoft
Overlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm [email protected] [email protected] Department of Computer Science Department of Electrical and Computer
Chapter 5 Linux Load Balancing Mechanisms
Chapter 5 Linux Load Balancing Mechanisms Load balancing mechanisms in multiprocessor systems have two compatible objectives. One is to prevent processors from being idle while others processors still
Resource Utilization of Middleware Components in Embedded Systems
Resource Utilization of Middleware Components in Embedded Systems 3 Introduction System memory, CPU, and network resources are critical to the operation and performance of any software system. These system
HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads
HP ProLiant Gen8 vs Gen9 Server Blades on Data Warehouse Workloads Gen9 Servers give more performance per dollar for your investment. Executive Summary Information Technology (IT) organizations face increasing
vsphere Resource Management Guide
ESX 4.0 ESXi 4.0 vcenter Server 4.0 This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a new edition. To check for more recent
SQL Server 2008 Performance and Scale
SQL Server 2008 Performance and Scale White Paper Published: February 2008 Updated: July 2008 Summary: Microsoft SQL Server 2008 incorporates the tools and technologies that are necessary to implement
vsphere Resource Management
ESXi 5.1 vcenter Server 5.1 This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a new edition. To check for more recent editions
Replication on Virtual Machines
Replication on Virtual Machines Siggi Cherem CS 717 November 23rd, 2004 Outline 1 Introduction The Java Virtual Machine 2 Napper, Alvisi, Vin - DSN 2003 Introduction JVM as state machine Addressing non-determinism
