Managing Adaptability in Heterogeneous Architectures through Performance Monitoring and Prediction



Similar documents
VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS

NVIDIA Tools For Profiling And Monitoring. David Goodwin

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

Performance Management for Cloudbased STC 2012

Full and Para Virtualization

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs

Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca

Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems

Performance Tuning and Optimizing SQL Databases 2016

SQL Server Performance Tuning and Optimization

Parallel Programming Survey

DELL. Virtual Desktop Infrastructure Study END-TO-END COMPUTING. Dell Enterprise Solutions Engineering

Exploring Oracle E-Business Suite Load Balancing Options. Venkat Perumal IT Convergence

Networking Virtualization Using FPGAs

Data Center and Cloud Computing Market Landscape and Challenges

Fair Scheduling Algorithm with Dynamic Load Balancing Using In Grid Computing

Microsoft SQL Server: MS Performance Tuning and Optimization Digital

Beyond Virtualization: A Novel Software Architecture for Multi-Core SoCs. Jim Ready September 18, 2012

Towards Elastic Application Model for Augmenting Computing Capabilities of Mobile Platforms. Mobilware 2010

Mark Bennett. Search and the Virtual Machine

An Oracle White Paper March Load Testing Best Practices for Oracle E- Business Suite using Oracle Application Testing Suite

Basics of Virtualisation

Evaluation Methodology of Converged Cloud Environments

Exploiting GPU Hardware Saturation for Fast Compiler Optimization

Moab and TORQUE Highlights CUG 2015

Self-Adapting Load Balancing for DNS

Part I Courses Syllabus

ANDROID DEVELOPER TOOLS TRAINING GTC Sébastien Dominé, NVIDIA

Computer Graphics Hardware An Overview

LS DYNA Performance Benchmarks and Profiling. January 2009

Power Efficiency Metrics for the Top500. Shoaib Kamil and John Shalf CRD/NERSC Lawrence Berkeley National Lab

A Middleware Strategy to Survive Compute Peak Loads in Cloud

An Analysis of Container-based Platforms for NFV

Petascale Software Challenges. Piyush Chaudhary High Performance Computing

Xeon+FPGA Platform for the Data Center

Virtual Machines.

big.little Technology Moves Towards Fully Heterogeneous Global Task Scheduling Improving Energy Efficiency and Performance in Mobile Devices

Intel Media Server Studio - Metrics Monitor (v1.1.0) Reference Manual

Manjrasoft Market Oriented Cloud Computing Platform

PCIe Storage Performance Testing Challenge

ECLIPSE Performance Benchmarks and Profiling. January 2009

Intelligent Heuristic Construction with Active Learning

Auto-Tunning of Data Communication on Heterogeneous Systems

Oracle Licensing Optimization (Complexity, Compliance, Configuration) Ed Hut Karen O Neill Oct 23, 2014

CLOUD GAMING WITH NVIDIA GRID TECHNOLOGIES Franck DIARD, Ph.D., SW Chief Software Architect GDC 2014

Windows Server 2008 R2 Hyper-V Live Migration

CHAPTER FIVE RESULT ANALYSIS

Windows Server 2008 R2 Hyper-V Live Migration

Security Systems. Technical Note. iscsi sessions if VRM Load Balancing in All Mode is used Version 1.0 STVC/PRM. Page 1 of 6

Next Generation Operating Systems

Limitations of Managing VMware vsphere with MS System Center Virtual Machine Manager 2012

Towards energy-aware scheduling in data centers using machine learning

Big Data Visualization on the MIC

Principles and characteristics of distributed systems and environments

Programmable Networking with Open vswitch

Autodesk Revit 2016 Product Line System Requirements and Recommendations

VIDEO SURVEILLANCE WITH SURVEILLUS VMS AND EMC ISILON STORAGE ARRAYS

Performance Management for Cloud-based Applications STC 2012

Oracle Database 11g Comparison Chart

Understanding the Performance of an X User Environment

Improved LS-DYNA Performance on Sun Servers

Fixed Price Website Load Testing

Debugging in Heterogeneous Environments with TotalView. ECMWF HPC Workshop 30 th October 2014

Proactive, Resource-Aware, Tunable Real-time Fault-tolerant Middleware

Sizing guide for SAP and VMware ESX Server running on HP ProLiant x86-64 platforms

Characterizing Task Usage Shapes in Google s Compute Clusters

How To Test For A Test On A Test Server

64CH/720p IP Camera Surveillance Solution

LabStats 5 System Requirements

System requirements for Autodesk Building Design Suite 2017

Virtual Network Provisioning and Fault-Management across Multiple Domains

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Why Standardize on Oracle Database 11g Next Generation Database Management. Thomas Kyte

VBLOCK SOLUTION FOR SAP: SAP APPLICATION AND DATABASE PERFORMANCE IN PHYSICAL AND VIRTUAL ENVIRONMENTS

Intel Xeon +FPGA Platform for the Data Center

ECLIPSE Best Practices Performance, Productivity, Efficiency. March 2009

OCIO HIGH RESOLUTION GRAPHICS DESKTOP VIRTUALIZATION POC. Chip Charnley Technical Expert Client Technologies Infrastructure Architecture

Avoid Paying The Virtualization Tax: Deploying Virtualized BI 4.0 The Right Way. Ashish C. Morzaria, SAP

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

Virtualizing Performance Asymmetric Multi-core Systems

Automating Big Data Benchmarking for Different Architectures with ALOJA

Transcription:

Managing Adaptability in Heterogeneous Architectures through Performance Monitoring and Prediction Cristina Silvano cristina.silvano@polimi.it Politecnico di Milano HiPEAC CSW Athens 2014

Motivations System adaptivity is a key issue for enabling applications running on many core heterogeneous architectures to operate close to optimal efficiency in the face of: Changing conditions by adjusting their behavior to their operating environments, usage contexts and resource availability Meeting the requirements on performance, power budget and Quality of Service 2

Introduction This talk introduces an application auto tuning framework for heterogeneous platforms in an adaptive multi application environment. The auto tuning framework should manage at runtime the assignment of system resources to the active concurrent applications (request level parallelism) 3

Application auto tuning The approach exploits the concepts of: Orthogonality between application auto tuning and runtime management of system resources to support multiple adaptive applications. Combination of design time and run time techniques to create an effective way of self aware computing with limited runtime overhead. 4

Application & Platform Domains System-wide Run-time Resource Manager 5 6

Heterogeneous parallel platforms Examples: STMicroelectronics P2012 (STHORM) Adapteva Parallela board Nvidia Tegra K1 Highly parallel software programmable accelerators Locally homogeneous, globally heterogeneous Computation offloading: separate host and device memory address spaces Different programming abstractions: native vs. OpenCL to support application code portability 6

Target problem (1) How to efficiently program (resource aware programming) applications on heterogeneous many core architectures to sustain performance? APP1 APP2 APP3 Heterogeneous Many-Core Platform Host CPU 7

Target problem (2) How to adapt at runtime the application behavior to sustain the performance? APP1 APP2 APP3 BE GT GT Heterogeneous Many-Core Platform Host CPU 8

Target problem (3) How to efficiently allocate and manage the system resources at runtime? APP1 APP2 APP3 BE GT GT Host CPU 9

Run Time Resource Management App1 App2 App3 Allocation Mapping 4 4 6 RTRM Target Platform Amit Kumar Singh, Muhammad Shafique, Akash Kumar, and Jörg Henkel. Mapping on multi/many core systems: Survey of current and emerging trends. In Proceedings of the 50th Annual Design Automation Conference (DAC). 2013. 10

Application auto tuning (1) Most of the applications are configurable in terms of a set of parameters Parameters: Color Shape Size 11

Application auto tuning (2) Key idea is that most of the applications are configurable in terms of a set of parameters by using run time knobs to trade off Quality of Results and Throughput Run-Time Knobs 12

Application auto tuning (2) Key idea is that most of the applications are configurable in terms of a set of parameters by using run time knobs to trade off Quality of Results and Throughput QoR Throughput 12

Why Run Time Knobs & Auto Tuning? In some applications internal knobs can be used at run time to trade off quality of results and performance Video Frame Rate Video Resolution Autonomous Video-surveillance System 13

Single application Multiple instances Video Frame Rate Video Resolution Autonomous Video-surveillance System 14

Application Auto tuning Framework (1) 1. Design time exploration phase based on code profiling to define the relations between application parameters and performance metrics (such as energy, throughput and accuracy) Operating Points combine application parameters and performance metrics Working Modes represent max resource allocation associated to a set of OPs 15

Application Auto tuning Framework (2) 2. Profiling information (OPs) are used to build a prediction model of application behavior, characterizing the effect of tunable parameters on performance metrics. 16

Application Auto tuning Framework (3) 3. Prediction model supports run time decisions based on monitoring to tune the application behavior according to available system resources. 17

Application Auto tuning Framework (3) Execution Loop 18

Application Auto tuning Framework (3) Monitoring 19

Application Auto tuning Framework (20) Re-Configure 20

Orthogonality Concept Requests Resources Platform OS Target HW Platform 21

Orthogonality Concept: App Auto tuning Requests Resources Resource Availability Platform OS Target HW Platform 22

Orthogonality Concept: App Auto tuning + RTRM Working Modes Requests Resources Run-Time Resource Manager Resource Availability Platform OS Target HW Platform 23

The Multi View Case Study 2 eyes third dimension 24

Pixel disparity 1 Left camera Right camera 2 5 Application Knobs 3 reference disparity QoR Disparity Error 25

Scenario: Multiple Multi view applications AMD NUMA 4 clusters quadcore Opteron @ 2.4 GHz Linux CGroups CPU-quota 26

Experimental Setup Target Platform AMD NUMA Architecture: 4 nodes 4 cores OpenCL 1.2 run time provided by AMD Workload Definition: Single application multiple instances Dynamic workload in terms of: Start time upon user request; Amount of frame data to process; Frame rate goal demanded by user 27

Application Auto Tuning Effects 28

Comparative Analysis Application Auto-tuning OFF ON Run-Time Resource Management OFF ON PLAIN-LINUX PLAIN-RTRM ADAPTIVE-LINUX ADAPTIVE-RTRM 29

Multiple Application Run Time Results PLAIN-LINUX ADAPTIVE-RTRM 30

Multiple Application Run Time Results ADAPTIVE-LINUX ADAPTIVE-RTRM 31

Dynamic workload analysis by varying the number of MV instances Evaluation metrics: Average Normalized Actual Penalty (NAP): User satisfaction to measure the distance of the run time performance from the required frame rate goal (Quality metric) Runtime throughput degradation w.r.t. the expected throughput based on offline profiling (Predictability metric) Normalized output quality loss: User satisfaction in terms of quality of the resulting image (Error metric) 32

Contention on shared resources APP1 APP2 SW-RTRM LLC misses LLC Memory Num. instances 33

Conclusions We proposed an approach to support application adaptivity in many core architectures exploiting: A runtime resource manager operating at system level combined with an application auto tuning framework Performance/quality trade offs and low overhead due to the combination of design time and run time techniques based on prediction models Most recent references: IET CDT2011, CODES ISSS 2012, ReCoSoC 2012, FPL 2013, PARCO 2013, ASP DAC 2014, DATE 2014, ARC 2014, SAMOS 2014, ASAP 2014. Most recent projects: FP7 2PARMA and FP7 HARPA (on going) 34