<Insert Picture Here> An Experimental Model to Analyze OpenMP Applications for System Utilization
|
|
|
- Allison Hodges
- 10 years ago
- Views:
Transcription
1 <Insert Picture Here> An Experimental Model to Analyze OpenMP Applications for System Utilization Mark Woodyard Principal Software Engineer 1
2 The following is an overview of a research project. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. The opinions expressed here are my own and do not express the views of Oracle Corporation. 2
3 The Importance of System Utilization No program is 100% parallel Utilization unimportant for one small, cheap server... Critically important for a large data center No OS can provide 100% cycles to applications Focus on single application performance Lean on processor sets, processor binding Result: reduced overall utilization Core counts per socket are increasing Systems are getting larger: more sockets & cores More than many applications can efficiently use 3
4 OpenMP Performance in a Throughput Context Operating System Scheduler: Software Threads Hardware Threads Scheduling & Work Distribution Interactions Impact Utilization process* (17 threads) process process (5 threads) ~ ~~~~ ~ ~~~~ ~ ~ ~ OS Scheduler context switch Thread Thread Thread Thread Socket Socket Thread Thread Thread Thread Core Core Core Core *Process = Executing Program ~ = Executing Thread Process Memory 4
5 Need: Enable High Parallel Utilization Philosophy: Make efficient use of any available cycles If my program doesn't, how do I find the cause? Key scalability questions: How well does the entire program scale? Which parallel region is a bottleneck? Do I have enough parallel work vs. overhead? Is the serial time significant vs. parallel work? Are the work chunks large enough vs. the time quantum? 5
6 Comparison of Scaling Analysis Models Dedicated Analysis Time to Solution Run at 1, 2, N threads One thread per core Compare speedup at each thread count Scaling Reference: traditional 1 vs. N cores Scaling Bottleneck: Ambiguous: OpenMP or hardware Non-Dedicated* Analysis Usage of Available Cycles Run only at N threads Up to N cores Model speedup at <= N cores Scaling Reference: loaded system Scaling Bottleneck: Unambiguous: OpenMP only *Non-Dedicated is called Throughput 6
7 c <Insert Picture Here> Tool Overview 7
8 Experimental Tool Experimental tool implements the model Runs on Solaris (x64 & SPARC) Model applies to any UNIX system Tool generates shell/dtrace script to run target program Trace data processed to generate performance model: Reconstruct the OpenMP program structure as run Simulate execution on a range of core counts (round-robin) Generate a text report, and optionally sequence of graphs Speedup analysis aggregated to: Each worksharing construct Each parallel region Entire program Complimentary to existing tools 8
9 Terms Used: What is Speedup? Speedup: ratio of work time to elapsed time Work Time: Any time not OpenMP overhead Measured as run with N threads Ideal Speedup: without OpenMP overhead Elapsed Time: Measured (as run), presuming all cores available Estimated via simulation (1 thru N cores using times as run) Preceding Serial: single-thread time before a parallel region Stand-Alone: loop speedup without considering preceding serial time 9
10 Example Tool Speedup Analysis Entire Program Serial Region: 1.29% of Program Time Only measured value on 4 cores Serial Region time of 1.29% agrees with Amdahl's Law prediction of 3.8 speedup 10
11 About the Charts Maximum value measured or estimated X Average value measured or estimated Minimum value measured or estimated 11
12 Example Tool Detail for a Work Share Loop with Load Imbalance Only measured value on 4 cores Max Est. Min Est. Avg Est. Load Imbalance; no significant overhead compared to compute 12
13 How To Measure Utilization on Solaris Use cputimes script from DTrace Toolkit Community Group DTrace Measures all cycles for duration of script: User threads Daemons Kernel Idle time Utilization: Percent of non-idle cycles during run Sanity check: daemons & kernel are small fraction Measurement system: Sun Ultra 27 Intel quad-core Xeon W3570 processor (3.2 GHz) Processor threading turned off (one thread per core) Solaris 10 13
14 c <Insert Picture Here> Example Program: Time Share Scheduling Class 14
15 Example Program: w/parallelized Fill Loop, n= !$omp parallel shared(a,b,c,n,p) private(i,j,k,z)... 38!$omp do schedule(runtime) 39 do k = 1, n 40 do i = 1, n 41 do j = 1, n 42 c(j,k) = c(j,k) + a(j,i) * b(i,k) * z + p 43 end do 44 end do 45 end do 46!$omp end do... Both regions should scale very well 71!$omp end parallel 91!$omp parallel do shared(a,pi,n) private(j,i) schedule(runtime) 92 do j = 1, n 93 do i = 1, n 94 a(i,j) = sqrt(2.0d0/(n+1))*sin(i*j*pi/(n+1)) 95 end do 96 end do 97!$omp end parallel do 15
16 Estimated Speedups: Static Work Distribution Parallel Loops of Lines 38 and 91 Why the poor scalability on the second region? 16
17 Four Thread & Single Thread Processes Time-Share Scheduler View Solaris Scheduling Priority P1 T2 Core run queue snapshot P1 T1 P1 T0 P0 T0 P1 T3 Process 0: single threaded (infinite loop) Process 1: 4-threads static schedule Two processes share one core Why the poor scalability on the second loop? 17
18 OpenMP Static Work Distribution & TS Scheduler Four Threads Sharing Four Cores With One Other Process, Static Solaris time slices: 20 millisec for high priority 200 millisec for low priority Loop 38 work: >16 sec Loop 91 work: ~57 msec Time Share forces compute-intensive threads to low priority Result: load imbalance for work chunks < time slice Loop 38 Dedicated Loop 38 Shared Loop 91 Dedicated Loop 91 Shared Thread 0 Thread 2 Other Process Thread 1 Thread 3 4x Speedup 3.2x 4x Speedup 2x Speedup; Gap Poor Utilization Does this really happen? Time Slice 18
19 TS Experiment: Fill Loop of Line 91 Isolated Loop Run with 4 Threads along with Single-Threaded Process Expect 3.2x Speedup (4/5ths of 4 cores) Dynamic & Guided Measurements 20 ms slice 200 ms slice Static Schedule Measurement 19
20 First Insight Any statically scheduled work-sharing construct can suffer from load imbalance if the time to execute each thread's work is small compared to the time quantum. How would a programmer identify this situation without measurement? What other problematic scenarios occur? 20
21 c <Insert Picture Here> Time Share System Utilization Results 21
22 TS System Utilization: Isolated Loop 91 Utilization measurement over 60 seconds Dynamic & Guided utilization: 99.9% Static allocation utilization measurement: 86% That's a 14% loss in utilization! Measured overhead: Dtrace command: % processor cycles Kernel & daemons: 0.04% 22
23 c <Insert Picture Here> Example Program: Fixed Priority Scheduling Class 23
24 Solaris Fixed-Priority (FX) Scheduling Class Scheduler Never Adjusts Thread Priority Solaris Scheduling Priority Core run queue snapshot nfs4cbd 59 sshd iscsid High FX Priority 1 Medium FX Priority Process 0 Low FX Priority Process Daemons run in TS, as needed Highest FX threads run dedicated Lower-priority threads get unused cycles Example: Priority 2: 2 thr Priority 0,1: 4 thr 24
25 Four Thread & Single Thread Processes Fixed-Priority Class Scheduler View Solaris Scheduling Priority Core run queue snapshot P0 1 0 T0 T1 T2 T3 Process P0: Fixed Priority 2 (infinite loop) Process P1: Four threads Fixed Priority 0 Two P1 threads share one core 25
26 Four Thread & Single Thread Processes Fixed-Priority Class Scheduler View Solaris Scheduling Priority Core run queue snapshot P0 1 0 T0 Blocked T1 T2 T3 Process P0: Fixed Priority 2 (infinite loop) Process P1: Four threads Fixed Priority 0 Two P1 threads share one core Until one thread is sometimes blocked 26
27 OpenMP Dynamic/Guided & Fixed Priority Scheduler Four Low-Priority Threads & One High-Priority Process Sharing 4 Cores High-Priority single-thread consumes one core Four threads share other three cores until one is blocked Static, Guided: potentially large remaining work Dynamic: controlled, smaller work chunks Loop 38 Dedicated Loop 38 Static Loop 38 Guided Loop 38 Dynamic Thread 0 Thread 2 Thread 1 Thread 3 4x Speedup Other Process 74.6% Utilization 2x Speedup 88% Util. Speedup: Depends on size of early chunk 99.9% Utilization Speedup: Depends on size of chunk Time Slice 27
28 FX Priority Class Experiment: Static Four Threads run on Three Cores; High-Priority Process on One Core 'Static' prediction with infinite time slices: no round-robin scheduling 28
29 FX Priority Class Experiment: Guided & Dynamic Four Threads run on Three Cores; High-Priority Process on One Core 'Guided' prediction without round-robin scheduling 29
30 Conclusions: OpenMP Throughput OpenMP throughput philosopy: Efficiently use any available processor cycles Tool helps the OpenMP programmer improve scalability Tool has uncovered subtle utilization bottlenecks Tool can help programmer improve utilization in Time Share scheduling class Fixed Priority scheduling class with prioritized control Dynamic schedule (or tasking): best for utilization If you choose a good chunk size Tool measurements enable good choice 30
31 Current Limitations & Future Work Nested Parallel Regions Not clear how to interpret speedup Tasking probably superior approach Collecting Traces Dropped trace data: high rate of calls, app probably won't scale Very large traces: option to restrict repeated regions; compress Threaded & NUMA systems Hardware characteristics: leverage existing tools Tool enables differentiation: OpenMP vs. hardware bottlenecks Add dedicated scaling analysis Future Work Larger configurations (in progress) Testing on a broad range of applications 31
32 32
33 <Insert Picture Here> Mark Woodyard Principal Software Engineer 33
Parallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis
Contributions to Gang Scheduling
CHAPTER 7 Contributions to Gang Scheduling In this Chapter, we present two techniques to improve Gang Scheduling policies by adopting the ideas of this Thesis. The first one, Performance- Driven Gang Scheduling,
An Oracle Technical White Paper November 2011. Oracle Solaris 11 Network Virtualization and Network Resource Management
An Oracle Technical White Paper November 2011 Oracle Solaris 11 Network Virtualization and Network Resource Management Executive Overview... 2 Introduction... 2 Network Virtualization... 2 Network Resource
CPU Scheduling. CPU Scheduling
CPU Scheduling Electrical and Computer Engineering Stephen Kim ([email protected]) ECE/IUPUI RTOS & APPS 1 CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling
An Oracle Benchmarking Study February 2011. Oracle Insurance Insbridge Enterprise Rating: Performance Assessment
An Oracle Benchmarking Study February 2011 Oracle Insurance Insbridge Enterprise Rating: Performance Assessment Executive Overview... 1 RateManager Testing... 2 Test Environment... 2 Test Scenarios...
Resource Containers: A new facility for resource management in server systems
CS 5204 Operating Systems Resource Containers: A new facility for resource management in server systems G. Banga, P. Druschel, Rice Univ. J. C. Mogul, Compaq OSDI 1999 Outline Background Previous Approaches
Scaling in a Hypervisor Environment
Scaling in a Hypervisor Environment Richard McDougall Chief Performance Architect VMware VMware ESX Hypervisor Architecture Guest Monitor Guest TCP/IP Monitor (BT, HW, PV) File System CPU is controlled
11.1 inspectit. 11.1. inspectit
11.1. inspectit Figure 11.1. Overview on the inspectit components [Siegl and Bouillet 2011] 11.1 inspectit The inspectit monitoring tool (website: http://www.inspectit.eu/) has been developed by NovaTec.
SP Apps 1.1.4 Performance test Test report. 2012/10 Mai Au
SP Apps 1.1.4 Performance test Test report 2012/10 Mai Au SP Apps 1.1.0 Performance test... 1 Test report... 1 1. Purpose... 3 2. Performance criteria... 3 3. Environments used for performance testing...
An Oracle White Paper March 2013. Load Testing Best Practices for Oracle E- Business Suite using Oracle Application Testing Suite
An Oracle White Paper March 2013 Load Testing Best Practices for Oracle E- Business Suite using Oracle Application Testing Suite Executive Overview... 1 Introduction... 1 Oracle Load Testing Setup... 2
Scheduling. Yücel Saygın. These slides are based on your text book and on the slides prepared by Andrew S. Tanenbaum
Scheduling Yücel Saygın These slides are based on your text book and on the slides prepared by Andrew S. Tanenbaum 1 Scheduling Introduction to Scheduling (1) Bursts of CPU usage alternate with periods
Chapter 2: OS Overview
Chapter 2: OS Overview CmSc 335 Operating Systems 1. Operating system objectives and functions Operating systems control and support the usage of computer systems. a. usage users of a computer system:
Informatica Ultra Messaging SMX Shared-Memory Transport
White Paper Informatica Ultra Messaging SMX Shared-Memory Transport Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance This document contains Confidential, Proprietary and Trade
Lecture Outline Overview of real-time scheduling algorithms Outline relative strengths, weaknesses
Overview of Real-Time Scheduling Embedded Real-Time Software Lecture 3 Lecture Outline Overview of real-time scheduling algorithms Clock-driven Weighted round-robin Priority-driven Dynamic vs. static Deadline
Real-Time Scheduling 1 / 39
Real-Time Scheduling 1 / 39 Multiple Real-Time Processes A runs every 30 msec; each time it needs 10 msec of CPU time B runs 25 times/sec for 15 msec C runs 20 times/sec for 5 msec For our equation, A
How To Test For Performance And Scalability On A Server With A Multi-Core Computer (For A Large Server)
Scalability Results Select the right hardware configuration for your organization to optimize performance Table of Contents Introduction... 1 Scalability... 2 Definition... 2 CPU and Memory Usage... 2
Chapter 5 Process Scheduling
Chapter 5 Process Scheduling CPU Scheduling Objective: Basic Scheduling Concepts CPU Scheduling Algorithms Why Multiprogramming? Maximize CPU/Resources Utilization (Based on Some Criteria) CPU Scheduling
SQL Server Business Intelligence on HP ProLiant DL785 Server
SQL Server Business Intelligence on HP ProLiant DL785 Server By Ajay Goyal www.scalabilityexperts.com Mike Fitzner Hewlett Packard www.hp.com Recommendations presented in this document should be thoroughly
Delivering Quality in Software Performance and Scalability Testing
Delivering Quality in Software Performance and Scalability Testing Abstract Khun Ban, Robert Scott, Kingsum Chow, and Huijun Yan Software and Services Group, Intel Corporation {khun.ban, robert.l.scott,
Deciding which process to run. (Deciding which thread to run) Deciding how long the chosen process can run
SFWR ENG 3BB4 Software Design 3 Concurrent System Design 2 SFWR ENG 3BB4 Software Design 3 Concurrent System Design 11.8 10 CPU Scheduling Chapter 11 CPU Scheduling Policies Deciding which process to run
HPSA Agent Characterization
HPSA Agent Characterization Product HP Server Automation (SA) Functional Area Managed Server Agent Release 9.0 Page 1 HPSA Agent Characterization Quick Links High-Level Agent Characterization Summary...
Operating Systems. III. Scheduling. http://soc.eurecom.fr/os/
Operating Systems Institut Mines-Telecom III. Scheduling Ludovic Apvrille [email protected] Eurecom, office 470 http://soc.eurecom.fr/os/ Outline Basics of Scheduling Definitions Switching
Operating Systems Concepts: Chapter 7: Scheduling Strategies
Operating Systems Concepts: Chapter 7: Scheduling Strategies Olav Beckmann Huxley 449 http://www.doc.ic.ac.uk/~ob3 Acknowledgements: There are lots. See end of Chapter 1. Home Page for the course: http://www.doc.ic.ac.uk/~ob3/teaching/operatingsystemsconcepts/
Ready Time Observations
VMWARE PERFORMANCE STUDY VMware ESX Server 3 Ready Time Observations VMware ESX Server is a thin software layer designed to multiplex hardware resources efficiently among virtual machines running unmodified
Operating Systems, 6 th ed. Test Bank Chapter 7
True / False Questions: Chapter 7 Memory Management 1. T / F In a multiprogramming system, main memory is divided into multiple sections: one for the operating system (resident monitor, kernel) and one
Server Migration from UNIX/RISC to Red Hat Enterprise Linux on Intel Xeon Processors:
Server Migration from UNIX/RISC to Red Hat Enterprise Linux on Intel Xeon Processors: Lowering Total Cost of Ownership A Case Study Published by: Alinean, Inc. 201 S. Orange Ave Suite 1210 Orlando, FL
Envox CDP 7.0 Performance Comparison of VoiceXML and Envox Scripts
Envox CDP 7.0 Performance Comparison of and Envox Scripts Goal and Conclusion The focus of the testing was to compare the performance of and ENS applications. It was found that and ENS applications have
SIDN Server Measurements
SIDN Server Measurements Yuri Schaeffer 1, NLnet Labs NLnet Labs document 2010-003 July 19, 2010 1 Introduction For future capacity planning SIDN would like to have an insight on the required resources
Throughput Capacity Planning and Application Saturation
Throughput Capacity Planning and Application Saturation Alfred J. Barchi [email protected] http://www.ajbinc.net/ Introduction Applications have a tendency to be used more heavily by users over time, as the
Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers
WHITE PAPER FUJITSU PRIMERGY AND PRIMEPOWER SERVERS Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers CHALLENGE Replace a Fujitsu PRIMEPOWER 2500 partition with a lower cost solution that
QLIKVIEW SERVER LINEAR SCALING
QLIKVIEW SERVER LINEAR SCALING QlikView Scalability Center Technical Brief Series June 212 qlikview.com Introduction This technical brief presents an investigation about how QlikView Server scales in performance
ESX Server Performance and Resource Management for CPU-Intensive Workloads
VMWARE WHITE PAPER VMware ESX Server 2 ESX Server Performance and Resource Management for CPU-Intensive Workloads VMware ESX Server 2 provides a robust, scalable virtualization framework for consolidating
Virtuoso and Database Scalability
Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of
2. is the number of processes that are completed per time unit. A) CPU utilization B) Response time C) Turnaround time D) Throughput
Import Settings: Base Settings: Brownstone Default Highest Answer Letter: D Multiple Keywords in Same Paragraph: No Chapter: Chapter 5 Multiple Choice 1. Which of the following is true of cooperative scheduling?
VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5
Performance Study VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5 VMware VirtualCenter uses a database to store metadata on the state of a VMware Infrastructure environment.
Best Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays
Best Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays Database Solutions Engineering By Murali Krishnan.K Dell Product Group October 2009
Introduction. Application Performance in the QLinux Multimedia Operating System. Solution: QLinux. Introduction. Outline. QLinux Design Principles
Application Performance in the QLinux Multimedia Operating System Sundaram, A. Chandra, P. Goyal, P. Shenoy, J. Sahni and H. Vin Umass Amherst, U of Texas Austin ACM Multimedia, 2000 Introduction General
Multicore Parallel Computing with OpenMP
Multicore Parallel Computing with OpenMP Tan Chee Chiang (SVU/Academic Computing, Computer Centre) 1. OpenMP Programming The death of OpenMP was anticipated when cluster systems rapidly replaced large
CPU Scheduling. Core Definitions
CPU Scheduling General rule keep the CPU busy; an idle CPU is a wasted CPU Major source of CPU idleness: I/O (or waiting for it) Many programs have a characteristic CPU I/O burst cycle alternating phases
Informatica Data Director Performance
Informatica Data Director Performance 2011 Informatica Abstract A variety of performance and stress tests are run on the Informatica Data Director to ensure performance and scalability for a wide variety
Networking Virtualization Using FPGAs
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Massachusetts,
Performance Characteristics of Large SMP Machines
Performance Characteristics of Large SMP Machines Dirk Schmidl, Dieter an Mey, Matthias S. Müller [email protected] Rechen- und Kommunikationszentrum (RZ) Agenda Investigated Hardware Kernel Benchmark
Performance Baseline of Oracle Exadata X2-2 HR HC. Part II: Server Performance. Benchware Performance Suite Release 8.4 (Build 130630) September 2013
Performance Baseline of Oracle Exadata X2-2 HR HC Part II: Server Performance Benchware Performance Suite Release 8.4 (Build 130630) September 2013 Contents 1 Introduction to Server Performance Tests 2
Processor Scheduling. Queues Recall OS maintains various queues
Processor Scheduling Chapters 9 and 10 of [OS4e], Chapter 6 of [OSC]: Queues Scheduling Criteria Cooperative versus Preemptive Scheduling Scheduling Algorithms Multi-level Queues Multiprocessor and Real-Time
Performance Test Report For OpenCRM. Submitted By: Softsmith Infotech.
Performance Test Report For OpenCRM Submitted By: Softsmith Infotech. About The Application: OpenCRM is a Open Source CRM software ideal for small and medium businesses for managing leads, accounts and
CPU SCHEDULING (CONT D) NESTED SCHEDULING FUNCTIONS
CPU SCHEDULING CPU SCHEDULING (CONT D) Aims to assign processes to be executed by the CPU in a way that meets system objectives such as response time, throughput, and processor efficiency Broken down into
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms
Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Family-Based Platforms Executive Summary Complex simulations of structural and systems performance, such as car crash simulations,
Benchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
How to control Resource allocation on pseries multi MCM system
How to control Resource allocation on pseries multi system Pascal Vezolle Deep Computing EMEA ATS-P.S.S.C/ Montpellier FRANCE Agenda AIX Resource Management Tools WorkLoad Manager (WLM) Affinity Services
Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies
Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies Kurt Klemperer, Principal System Performance Engineer [email protected] Agenda Session Length:
Load Imbalance Analysis
With CrayPat Load Imbalance Analysis Imbalance time is a metric based on execution time and is dependent on the type of activity: User functions Imbalance time = Maximum time Average time Synchronization
Objectives. Chapter 5: CPU Scheduling. CPU Scheduler. Non-preemptive and preemptive. Dispatcher. Alternating Sequence of CPU And I/O Bursts
Objectives Chapter 5: CPU Scheduling Introduce CPU scheduling, which is the basis for multiprogrammed operating systems Describe various CPU-scheduling algorithms Discuss evaluation criteria for selecting
Understanding Hardware Transactional Memory
Understanding Hardware Transactional Memory Gil Tene, CTO & co-founder, Azul Systems @giltene 2015 Azul Systems, Inc. Agenda Brief introduction What is Hardware Transactional Memory (HTM)? Cache coherence
10.04.2008. Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details
Thomas Fahrig Senior Developer Hypervisor Team Hypervisor Architecture Terminology Goals Basics Details Scheduling Interval External Interrupt Handling Reserves, Weights and Caps Context Switch Waiting
UTS: An Unbalanced Tree Search Benchmark
UTS: An Unbalanced Tree Search Benchmark LCPC 2006 1 Coauthors Stephen Olivier, UNC Jun Huan, UNC/Kansas Jinze Liu, UNC Jan Prins, UNC James Dinan, OSU P. Sadayappan, OSU Chau-Wen Tseng, UMD Also, thanks
Web Application s Performance Testing
Web Application s Performance Testing B. Election Reddy (07305054) Guided by N. L. Sarda April 13, 2008 1 Contents 1 Introduction 4 2 Objectives 4 3 Performance Indicators 5 4 Types of Performance Testing
W4118 Operating Systems. Instructor: Junfeng Yang
W4118 Operating Systems Instructor: Junfeng Yang Outline Introduction to scheduling Scheduling algorithms 1 Direction within course Until now: interrupts, processes, threads, synchronization Mostly mechanisms
DATABASE. Pervasive PSQL Performance. Key Performance Features of Pervasive PSQL. Pervasive PSQL White Paper
DATABASE Pervasive PSQL Performance Key Performance Features of Pervasive PSQL Pervasive PSQL White Paper June 2008 Table of Contents Introduction... 3 Per f o r m a n c e Ba s i c s: Mo r e Me m o r y,
GPU File System Encryption Kartik Kulkarni and Eugene Linkov
GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through
ICS 143 - Principles of Operating Systems
ICS 143 - Principles of Operating Systems Lecture 5 - CPU Scheduling Prof. Nalini Venkatasubramanian [email protected] Note that some slides are adapted from course text slides 2008 Silberschatz. Some
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
A Study on Reducing Labor Costs Through the Use of WebSphere Cloudburst Appliance
A Study on Reducing Labor Costs Through the Use of WebSphere Cloudburst Appliance IBM Software Group, Competitive Project Office Scott Bain Faisal Rajib Barbara Sannerud Dr. John Shedletsky Dr. Barry Willner
Liferay Portal Performance. Benchmark Study of Liferay Portal Enterprise Edition
Liferay Portal Performance Benchmark Study of Liferay Portal Enterprise Edition Table of Contents Executive Summary... 3 Test Scenarios... 4 Benchmark Configuration and Methodology... 5 Environment Configuration...
Oracle Solaris: Aktueller Stand und Ausblick
Oracle Solaris: Aktueller Stand und Ausblick Detlef Drewanz Principal Sales Consultant, EMEA Server Presales The following is intended to outline our general product direction. It
Performance Analysis of Web based Applications on Single and Multi Core Servers
Performance Analysis of Web based Applications on Single and Multi Core Servers Gitika Khare, Diptikant Pathy, Alpana Rajan, Alok Jain, Anil Rawat Raja Ramanna Centre for Advanced Technology Department
Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud
An Oracle White Paper July 2011 Oracle TimesTen In-Memory Database on Oracle Exalogic Elastic Cloud Executive Summary... 3 Introduction... 4 Hardware and Software Overview... 5 Compute Node... 5 Storage
Getting OpenMP Up To Speed
1 Getting OpenMP Up To Speed Ruud van der Pas Senior Staff Engineer Oracle Solaris Studio Oracle Menlo Park, CA, USA IWOMP 2010 CCS, University of Tsukuba Tsukuba, Japan June 14-16, 2010 2 Outline The
A Comparison of Oracle Performance on Physical and VMware Servers
A Comparison of Oracle Performance on Physical and VMware Servers By Confio Software Confio Software 4772 Walnut Street, Suite 100 Boulder, CO 80301 303-938-8282 www.confio.com Comparison of Physical and
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
Streaming and Virtual Hosted Desktop Study
White Paper Intel Information Technology Streaming, Virtual Hosted Desktop, Computing Models, Client Virtualization Streaming and Virtual Hosted Desktop Study Benchmarking Results As part of an ongoing
Capacity Planning for Microsoft SharePoint Technologies
Capacity Planning for Microsoft SharePoint Technologies Capacity Planning The process of evaluating a technology against the needs of an organization, and making an educated decision about the configuration
Informatica Master Data Management Multi Domain Hub API: Performance and Scalability Diagnostics Checklist
Informatica Master Data Management Multi Domain Hub API: Performance and Scalability Diagnostics Checklist 2012 Informatica Corporation. No part of this document may be reproduced or transmitted in any
Copyright 2013, Oracle and/or its affiliates. All rights reserved.
1 Oracle SPARC Server for Enterprise Computing Dr. Heiner Bauch Senior Account Architect 19. April 2013 2 The following is intended to outline our general product direction. It is intended for information
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o [email protected]
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o [email protected] Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket
Scheduling Task Parallelism" on Multi-Socket Multicore Systems"
Scheduling Task Parallelism" on Multi-Socket Multicore Systems" Stephen Olivier, UNC Chapel Hill Allan Porterfield, RENCI Kyle Wheeler, Sandia National Labs Jan Prins, UNC Chapel Hill Outline" Introduction
Job Scheduling Model
Scheduling 1 Job Scheduling Model problem scenario: a set of jobs needs to be executed using a single server, on which only one job at a time may run for theith job, we have an arrival timea i and a run
PeerMon: A Peer-to-Peer Network Monitoring System
PeerMon: A Peer-to-Peer Network Monitoring System Tia Newhall, Janis Libeks, Ross Greenwood, Jeff Knerr Computer Science Department Swarthmore College Swarthmore, PA USA [email protected] Target:
Microsoft Windows Server 2003 with Internet Information Services (IIS) 6.0 vs. Linux Competitive Web Server Performance Comparison
April 23 11 Aviation Parkway, Suite 4 Morrisville, NC 2756 919-38-28 Fax 919-38-2899 32 B Lakeside Drive Foster City, CA 9444 65-513-8 Fax 65-513-899 www.veritest.com [email protected] Microsoft Windows
OpenMP Programming on ScaleMP
OpenMP Programming on ScaleMP Dirk Schmidl [email protected] Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign
JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra
JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra January 2014 Legal Notices Apache Cassandra, Spark and Solr and their respective logos are trademarks or registered trademarks
McAfee VirusScan Enterprise for Storage 1.0 Sizing Guide for NetApp Filer on Data ONTAP 7.x
McAfee VirusScan Enterprise for Storage.0 Sizing Guide for NetApp Filer on Data ONTAP 7.x COPYRIGHT Copyright 200 McAfee, Inc. All Rights Reserved. No part of this publication may be reproduced, transmitted,
Agility Database Scalability Testing
Agility Database Scalability Testing V1.6 November 11, 2012 Prepared by on behalf of Table of Contents 1 Introduction... 4 1.1 Brief... 4 2 Scope... 5 3 Test Approach... 6 4 Test environment setup... 7
Applications. Network Application Performance Analysis. Laboratory. Objective. Overview
Laboratory 12 Applications Network Application Performance Analysis Objective The objective of this lab is to analyze the performance of an Internet application protocol and its relation to the underlying
Oracle Database Scalability in VMware ESX VMware ESX 3.5
Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises
TRACE PERFORMANCE TESTING APPROACH. Overview. Approach. Flow. Attributes
TRACE PERFORMANCE TESTING APPROACH Overview Approach Flow Attributes INTRODUCTION Software Testing Testing is not just finding out the defects. Testing is not just seeing the requirements are satisfied.
POSIX and Object Distributed Storage Systems
1 POSIX and Object Distributed Storage Systems Performance Comparison Studies With Real-Life Scenarios in an Experimental Data Taking Context Leveraging OpenStack Swift & Ceph by Michael Poat, Dr. Jerome
LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.
LS-DYNA Scalability on Cray Supercomputers Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. WP-LS-DYNA-12213 www.cray.com Table of Contents Abstract... 3 Introduction... 3 Scalability
Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009
Performance Study Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009 Introduction With more and more mission critical networking intensive workloads being virtualized
Q & A From Hitachi Data Systems WebTech Presentation:
Q & A From Hitachi Data Systems WebTech Presentation: RAID Concepts 1. Is the chunk size the same for all Hitachi Data Systems storage systems, i.e., Adaptable Modular Systems, Network Storage Controller,
Load Balancing on Speed
Load Balancing on Speed Steven Hofmeyr, Costin Iancu, Filip Blagojević Lawrence Berkeley National Laboratory {shofmeyr, cciancu, fblagojevic}@lbl.gov Abstract To fully exploit multicore processors, applications
Amazon EC2 XenApp Scalability Analysis
WHITE PAPER Citrix XenApp Amazon EC2 XenApp Scalability Analysis www.citrix.com Table of Contents Introduction...3 Results Summary...3 Detailed Results...4 Methods of Determining Results...4 Amazon EC2
JBoss Data Grid Performance Study Comparing Java HotSpot to Azul Zing
JBoss Data Grid Performance Study Comparing Java HotSpot to Azul Zing January 2014 Legal Notices JBoss, Red Hat and their respective logos are trademarks or registered trademarks of Red Hat, Inc. Azul
Sizing guide for SAP and VMware ESX Server running on HP ProLiant x86-64 platforms
Sizing guide for SAP and VMware ESX Server running on HP ProLiant x86-64 platforms Executive summary... 2 Server virtualization overview... 2 Solution definition...2 SAP architecture... 2 VMware... 3 Virtual
An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6
Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6 Winter Term 2008 / 2009 Jun.-Prof. Dr. André Brinkmann [email protected] Universität Paderborn PC² Agenda Multiprocessor and
Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering
Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC
