VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS
|
|
- Jocelin Davis
- 7 years ago
- Views:
Transcription
1 VALAR: A BENCHMARK SUITE TO STUDY THE DYNAMIC BEHAVIOR OF HETEROGENEOUS SYSTEMS Perhaad Mistry, Yash Ukidave, Dana Schaa, David Kaeli Department of Electrical and Computer Engineering Northeastern University, Boston, USA ASPLOS 2013 Houston, TX 16 th March, GPGPU 6 March 2013
2 WHAT IS THIS TALK ABOUT? A benchmark suite for heterogeneous computing written in OpenCL that allows us to study the interaction between compute devices in heterogeneous application environments 2 GPGPU 6 March 2013
3 TOPICS Goals of an alternative benchmark suite for heterogeneous computing Classifying heterogeneous applications based on their behavior and their mapping to compute devices Brief overview of Valar s Benchmarks Evaluation methodology Example exploration studies Conclusions and Future work 3 GPGPU 6 March 2013
4 MOTIVATION Benchmarks for evaluating workload partitioning on CPU-GPU systems Most open source benchmark suites for heterogeneous systems do not utilize both the CPU and GPU device(s) for compute in OpenCL Allow a wide range of behavior(s) within the same application to evaluate data movement optimizations A Benchmark suite with different behavior scenarios of heterogeneous applications To evaluate runtimes and schedulers targeting heterogeneous systems Fit somewhere between microbenchmarks and complete applications 4 GPGPU 6 March 2013
5 APPLICATION CLASSIFICATION IMPLEMENTATION Implementation classification covers mapping of computation onto compute devices present Mapping could be static or dynamically decided Determined by algorithm s development and mapping to the compute device Compute Pipeline: Large stream of kernels and minimum IO Multidevice Execution: Computation partitioned over multiple devices with or without frequent communication 5 GPGPU 6 March 2013
6 APPLICATION CLASSIFICATION - BEHAVIORAL Behavioral classification covers the algorithm s usage scenario Separate discussion of implementation of application and its behavior Quality of Service Behavior: Application depends on error or data characteristics Multiple independent Behavior: Small independent tasks continuously offloaded High B/W input Behavior: Large data streams, high bandwidth GPU workloads 6 GPGPU 6 March 2013
7 VALAR S APPLICATIONS PHYSICS SIMULATION Collision pipeline: A physics application where large and small particle combination define workload behavior GPU performs the small small collisions CPU performs the large small and large large collisions. Behavioral space explored using No of particles Ratio of large and small particles GPU (Posn, Vel, Force) CPU Build Grid Synchronization SS Collide LS Collide ForceLS LL Collide Synchronization S Integrate LL Integrate 7 GPGPU 6 March 2013
8 VALAR S APPLICATIONS FINITE IMPULSE FILTER (FIR) Adaptive FIR: A streaming DSP application used in audio filtering, speech recognition, and pulse detection OP signal generated by multiplying output with a set of taps Adaptive FIR changes weight of filter taps on a separate command queue based on signal characteristics Behavioral space explored using Filter block size and number of taps Compute Intensity Dispatch Frequency IO frequency and size 8 GPGPU 6 March 2013
9 VALAR S APPLICATIONS SEARCH Search Application: Simple application searching for a range of values in data GPU OpenCL kernel searches for a set of target data values in blocks of data Application hands off the resultant data to the CPU for a final reduction Behavioral Space Explored Using Interval: Communication frequency of results from GPU to CPU Data pool size: Size of GPU kernel CPU GPU Initialize Data Range Synchronization Search Kernel Initial Reduction Synchronization Final Reduction & Init new data range 9 GPGPU 6 March 2013
10 VALAR S APPLICATIONS SPEEDED UP ROBUST FEATURES (SURF) SURF: Feature detection application that summarizes an image into a number of interest points. Applications in object recognition, tracking, image stitching Behavioral Space Explored Using Image size Host Device I/O size and compute intensity Image color patterns Compute intensity 10 GPGPU 6 March 2013
11 VALAR S APPLICATIONS TRAFFIC Traffic Application: Cellular automaton model (NS model) for road traffic flow to reproduce traffic jams Models traffic jams as an emergent phenomenon due to interaction between cars on road Behavioral Space Explored Using No of cars and their distribution: Compute intensity of kernels Maximum Velocity: affects number of kernel calls per timestep Simple OpenCL kernel called over multiple strides 11 GPGPU 6 March 2013
12 PERFORMANCE ANALYSIS IN A HETEROGENEOUS HIERARCHY Categorization goal: Reflect algorithm, data mapping and kernel optimization in benchmark selection Layers to study heterogeneous application performance AL0 Application input AL1 OpenCL level behavior Host device behavior induced by input arguments AL2 Compute device specific Hardware counter statistics Abstraction Layer AL0 Benchmark Options AL1 Host Device interaction AL2 Device H/W Perf. Counters Southern Island GPUs Performance and Behavior Metrics Input arguments and data to benchmarks Kernel execn. freq vs IO. Kernel calls on CPU vs GPU Memory Transaction Freq Memory Transaction Size Vector ALU Busy % Scalar ALU Busy % Mem-Unit Busy % Registers Used Local Memory Used Throughput & time 12 GPGPU 6 March 2013
13 PERFORMANCE ANALYSIS IN A HETEROGENEOUS HIERARCHY Categorization goal: Reflect algorithm, data mapping and kernel optimization in benchmark selection Layers to study heterogeneous application performance AL0 Application input AL1 OpenCL level behavior Host device behavior induced by input arguments AL2 Compute device specific Hardware counter statistics Argument tracking OpenCL event based profiler AMD APP Profiler 13 GPGPU 6 March 2013
14 EXPERIMENTAL EVALUATION Kernel optimization studies are possible with Valar OpenCL kernels optimized while maintaining correctness on all OpenCL compliant platforms Experiments based on the host-device interaction can be used for the following architectural research Effects of data dependent kernels Benefits of host-device IO optimizations like write combining Kernel call and communication cost Different OpenCL buffer management strategies 14 GPGPU 6 March 2013
15 OPENCL KERNELS DATA DEPENDENT KERNELS IN VALAR Vector ALU utilization and memory unit utilization on AMD Southern Island GPUs Performance variation seen over the runtime of application for representative input cases 15 GPGPU 6 March 2013
16 INTERACTION RESULTS FIR The effect of write combining on application throughput fused and discrete devices Dispatch denotes the number of blocks combined in one kernel invocation Requires an application with enough flexibility in host-device IO and kernel Limited performance benefit seen for fused platforms and higher dispatch sizes 16 GPGPU 6 March 2013
17 INTERACTION RESULTS SEARCH Search: less coupled application - CPU-GPU communication is less frequent Effect of communication on application throughput in heterogeneous systems Comparing a midrange discrete GPU with an APU device APU system throughput comparable for small communication interval 17 GPGPU 6 March 2013
18 INTERACTION RESULTS SEARCH CPU performance: discrete vs APU At high communication: CPU kernel performance on APU reduces CPU kernel does gain from Quad core HT vs Quad core GPU performance: discrete vs APU Improvement for less frequent communication, more work on GPU High BW of SI GPUs vs APU decisive to throughput as communication reduces 18 GPGPU 6 March 2013
19 INTERACTION RESULTS PHYSICS Effect of CPU compute capacity on application throughput for a coupled application Application throughput for different particle distributions. Throughput for APU and discrete in similar range Time / step is affected by large particle counts 19 GPGPU 6 March 2013
20 INTERACTION RESULTS PHYSICS Effect of CPU compute capacity on application throughput for a coupled application Throughput for different large particle counts More large particles increase amount of work on CPU Substantial reduction in throughput Time / step is affected by large particle counts 20 GPGPU 6 March 2013
21 CONCLUSIONS AND FUTURE WORK Conclusions: Valar attempts to provide benchmarks that can generate a range of heterogeneous behavior for architectural research and application comparison Future Work Architectural Research Compare against discrete implementations and other programming models Evaluating power swishing on APUs and evaluate mobile low power SOCs Future Work Applications Predator algorithm (TLD) - coupled machine learning and feature detection More applications required, especially concurrent command queue usage Physics needs CPU OpenCL command queue instead of thread-pool Traffic needs a better algorithm and lane change model needs to be improved 21 GPGPU 6 March 2013
22 THANK YOU! QUESTIONS? COMMENTS? Perhaad Mistry 22 GPGPU 6 March 2013
23 INTERACTION RESULTS SURF IMAGE COMPARE Preprocessing added on CPU device at beginning of the pipeline Comparison kernel calculates difference between two gray-scale images Preprocessing result decides the decision to launch pipeline Heavier threshold values improve performance due to more frames skipped 23 GPGPU 6 March 2013
24 VALAR S APPLICATIONS SPEEDED UP ROBUST FEATURES (SURF) SURF: Feature detection application that summarizes an image into a number of interest points. Applications in object recognition, tracking, image stitching Behavioral Space Explored Using Image size Host Device I/O size and compute intensity Image color patterns Compute intensity 24 GPGPU 6 March 2013
25 EXTRA STUFF 25 GPGPU 6 March 2013
26 PERFORMANCE RESULTS SURF ORIENTATION COMPARE Orientation comparison useful if no camera rotation Test case for overhead since orientation step is < 10% of SURF computation Execution of compute pipeline interrupted to compare orientation vs. previous frame Frequency of orientation comparison increased, native denotes no HAPTIC More degradation in average performance seen for small videos 26 GPGPU 6 March 2013
27 VALAR S APPLICATIONS - PHYSICS SIMULATION Collision Detection Pipeline Large and small particles combination decides workload behavior GPU performs the small small collisions CPU performs the large small and large large collisions. Behavioral space explored using No of particles Ratio of large and small particles 27 GPGPU 6 March 2013
A Framework for Profiling and Performance Monitoring of Heterogeneous Applications
A Framework for Profiling and Performance Monitoring of Heterogeneous Applications Perhaad Mistry, Yash Ukidave, Dana Schaa, David Kaeli Department of Electrical and Computer Engineering Northeastern University,
More informationManaging Adaptability in Heterogeneous Architectures through Performance Monitoring and Prediction
Managing Adaptability in Heterogeneous Architectures through Performance Monitoring and Prediction Cristina Silvano cristina.silvano@polimi.it Politecnico di Milano HiPEAC CSW Athens 2014 Motivations System
More informationGPU Profiling with AMD CodeXL
GPU Profiling with AMD CodeXL Software Profiling Course Hannes Würfel OUTLINE 1. Motivation 2. GPU Recap 3. OpenCL 4. CodeXL Overview 5. CodeXL Internals 6. CodeXL Profiling 7. CodeXL Debugging 8. Sources
More informationGraphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011
Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis
More informationPART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design
PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions Slide 1 Outline Principles for performance oriented design Performance testing Performance tuning General
More informationAn Oracle Technical White Paper November 2011. Oracle Solaris 11 Network Virtualization and Network Resource Management
An Oracle Technical White Paper November 2011 Oracle Solaris 11 Network Virtualization and Network Resource Management Executive Overview... 2 Introduction... 2 Network Virtualization... 2 Network Resource
More informationGetting Started with CodeXL
AMD Developer Tools Team Advanced Micro Devices, Inc. Table of Contents Introduction... 2 Install CodeXL... 2 Validate CodeXL installation... 3 CodeXL help... 5 Run the Teapot Sample project... 5 Basic
More informationHETEROGENEOUS SYSTEM COHERENCE FOR INTEGRATED CPU-GPU SYSTEMS
HETEROGENEOUS SYSTEM COHERENCE FOR INTEGRATED CPU-GPU SYSTEMS JASON POWER*, ARKAPRAVA BASU*, JUNLI GU, SOORAJ PUTHOOR, BRADFORD M BECKMANN, MARK D HILL*, STEVEN K REINHARDT, DAVID A WOOD* *University of
More informationTowards Elastic Application Model for Augmenting Computing Capabilities of Mobile Platforms. Mobilware 2010
Towards lication Model for Augmenting Computing Capabilities of Mobile Platforms Mobilware 2010 Xinwen Zhang, Simon Gibbs, Anugeetha Kunjithapatham, and Sangoh Jeong Computer Science Lab. Samsung Information
More informationPerformance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009
Performance Study Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009 Introduction With more and more mission critical networking intensive workloads being virtualized
More informationIntroducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
More informationLBPerf: An Open Toolkit to Empirically Evaluate the Quality of Service of Middleware Load Balancing Services
LBPerf: An Open Toolkit to Empirically Evaluate the Quality of Service of Middleware Load Balancing Services Ossama Othman Jaiganesh Balasubramanian Dr. Douglas C. Schmidt {jai, ossama, schmidt}@dre.vanderbilt.edu
More informationCisco Integrated Services Routers Performance Overview
Integrated Services Routers Performance Overview What You Will Learn The Integrated Services Routers Generation 2 (ISR G2) provide a robust platform for delivering WAN services, unified communications,
More informationIntroduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1
Introduction to GP-GPUs Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1 GPU Architectures: How do we reach here? NVIDIA Fermi, 512 Processing Elements (PEs) 2 What Can It Do?
More informationData Center and Cloud Computing Market Landscape and Challenges
Data Center and Cloud Computing Market Landscape and Challenges Manoj Roge, Director Wired & Data Center Solutions Xilinx Inc. #OpenPOWERSummit 1 Outline Data Center Trends Technology Challenges Solution
More informationAn Approach to Load Balancing In Cloud Computing
An Approach to Load Balancing In Cloud Computing Radha Ramani Malladi Visiting Faculty, Martins Academy, Bangalore, India ABSTRACT: Cloud computing is a structured model that defines computing services,
More informationNVIDIA Tools For Profiling And Monitoring. David Goodwin
NVIDIA Tools For Profiling And Monitoring David Goodwin Outline CUDA Profiling and Monitoring Libraries Tools Technologies Directions CScADS Summer 2012 Workshop on Performance Tools for Extreme Scale
More informationPerformance Management for Cloudbased STC 2012
Performance Management for Cloudbased Applications STC 2012 1 Agenda Context Problem Statement Cloud Architecture Need for Performance in Cloud Performance Challenges in Cloud Generic IaaS / PaaS / SaaS
More informationChapter 1 Computer System Overview
Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Eighth Edition By William Stallings Operating System Exploits the hardware resources of one or more processors Provides
More informationNext Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
More informationDesign and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms
Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,
More informationOverlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer
More informationAdvanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2
Lecture Handout Computer Architecture Lecture No. 2 Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of
More informationOptimizing Application Performance with CUDA Profiling Tools
Optimizing Application Performance with CUDA Profiling Tools Why Profile? Application Code GPU Compute-Intensive Functions Rest of Sequential CPU Code CPU 100 s of cores 10,000 s of threads Great memory
More informationComparison of Request Admission Based Performance Isolation Approaches in Multi-tenant SaaS Applications
Comparison of Request Admission Based Performance Isolation Approaches in Multi-tenant SaaS Applications Rouven Kreb 1 and Manuel Loesch 2 1 SAP AG, Walldorf, Germany 2 FZI Research Center for Information
More informationMAQAO Performance Analysis and Optimization Tool
MAQAO Performance Analysis and Optimization Tool Andres S. CHARIF-RUBIAL andres.charif@uvsq.fr Performance Evaluation Team, University of Versailles S-Q-Y http://www.maqao.org VI-HPS 18 th Grenoble 18/22
More informationFull and Para Virtualization
Full and Para Virtualization Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF x86 Hardware Virtualization The x86 architecture offers four levels
More informationQoS-Aware Storage Virtualization for Cloud File Systems. Christoph Kleineweber (Speaker) Alexander Reinefeld Thorsten Schütt. Zuse Institute Berlin
QoS-Aware Storage Virtualization for Cloud File Systems Christoph Kleineweber (Speaker) Alexander Reinefeld Thorsten Schütt Zuse Institute Berlin 1 Outline Introduction Performance Models Reservation Scheduling
More informationCapstone Overview Architecture for Big Data & Machine Learning. Debbie Marr ICRI-CI 2015 Retreat, May 5, 2015
Capstone Overview Architecture for Big Data & Machine Learning Debbie Marr ICRI-CI 2015 Retreat, May 5, 2015 Accelerators Memory Traffic Reduction Memory Intensive Arch. Context-based Prefetching Deep
More informationGPU Computing - CUDA
GPU Computing - CUDA A short overview of hardware and programing model Pierre Kestener 1 1 CEA Saclay, DSM, Maison de la Simulation Saclay, June 12, 2012 Atelier AO and GPU 1 / 37 Content Historical perspective
More informationGPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics
GPU Architectures A CPU Perspective Derek Hower AMD Research 5/21/2013 Goals Data Parallelism: What is it, and how to exploit it? Workload characteristics Execution Models / GPU Architectures MIMD (SPMD),
More informationEfficient Parallel Processing on Public Cloud Servers Using Load Balancing
Efficient Parallel Processing on Public Cloud Servers Using Load Balancing Valluripalli Srinath 1, Sudheer Shetty 2 1 M.Tech IV Sem CSE, Sahyadri College of Engineering & Management, Mangalore. 2 Asso.
More informationContainer-based operating system virtualization: a scalable, high-performance alternative to hypervisors
Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors Soltesz, et al (Princeton/Linux-VServer), Eurosys07 Context: Operating System Structure/Organization
More informationIn-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller
In-Memory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency
More informationGEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications
GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102
More informationMulti-GPU Load Balancing for Simulation and Rendering
Multi- Load Balancing for Simulation and Rendering Yong Cao Computer Science Department, Virginia Tech, USA In-situ ualization and ual Analytics Instant visualization and interaction of computing tasks
More informationA Computer Vision System on a Chip: a case study from the automotive domain
A Computer Vision System on a Chip: a case study from the automotive domain Gideon P. Stein Elchanan Rushinek Gaby Hayun Amnon Shashua Mobileye Vision Technologies Ltd. Hebrew University Jerusalem, Israel
More informationbig.little Technology Moves Towards Fully Heterogeneous Global Task Scheduling Improving Energy Efficiency and Performance in Mobile Devices
big.little Technology Moves Towards Fully Heterogeneous Global Task Scheduling Improving Energy Efficiency and Performance in Mobile Devices Brian Jeff November, 2013 Abstract ARM big.little processing
More informationThe Multi2Sim Simulation Framework. A CPU-GPU Model for Heterogeneous Computing (For Multi2Sim v. 4.2)
The Multi2Sim Simulation Framework A CPU-GPU Model for Heterogeneous Computing (For Multi2Sim v. 4.2) List of authors contributing to the development of the simulation framework and/or writing of this
More informationThe Key Technology Research of Virtual Laboratory based On Cloud Computing Ling Zhang
International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015) The Key Technology Research of Virtual Laboratory based On Cloud Computing Ling Zhang Nanjing Communications
More informationThe International Journal Of Science & Technoledge (ISSN 2321 919X) www.theijst.com
THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE Efficient Parallel Processing on Public Cloud Servers using Load Balancing Manjunath K. C. M.Tech IV Sem, Department of CSE, SEA College of Engineering
More informationTexture Cache Approximation on GPUs
Texture Cache Approximation on GPUs Mark Sutherland Joshua San Miguel Natalie Enright Jerger {suther68,enright}@ece.utoronto.ca, joshua.sanmiguel@mail.utoronto.ca 1 Our Contribution GPU Core Cache Cache
More informationReal-time Visual Tracker by Stream Processing
Real-time Visual Tracker by Stream Processing Simultaneous and Fast 3D Tracking of Multiple Faces in Video Sequences by Using a Particle Filter Oscar Mateo Lozano & Kuzahiro Otsuka presented by Piotr Rudol
More informationPros and Cons of HPC Cloud Computing
CloudStat 211 Pros and Cons of HPC Cloud Computing Nils gentschen Felde Motivation - Idea HPC Cluster HPC Cloud Cluster Management benefits of virtual HPC Dynamical sizing / partitioning Loadbalancing
More informationHardware Based Virtualization Technologies. Elsie Wahlig elsie.wahlig@amd.com Platform Software Architect
Hardware Based Virtualization Technologies Elsie Wahlig elsie.wahlig@amd.com Platform Software Architect Outline What is Virtualization? Evolution of Virtualization AMD Virtualization AMD s IO Virtualization
More informationReview from last time. CS 537 Lecture 3 OS Structure. OS structure. What you should learn from this lecture
Review from last time CS 537 Lecture 3 OS Structure What HW structures are used by the OS? What is a system call? Michael Swift Remzi Arpaci-Dussea, Michael Swift 1 Remzi Arpaci-Dussea, Michael Swift 2
More informationXeon+FPGA Platform for the Data Center
Xeon+FPGA Platform for the Data Center ISCA/CARL 2015 PK Gupta, Director of Cloud Platform Technology, DCG/CPG Overview Data Center and Workloads Xeon+FPGA Accelerator Platform Applications and Eco-system
More informationPerformance Monitoring of Parallel Scientific Applications
Performance Monitoring of Parallel Scientific Applications Abstract. David Skinner National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory This paper introduces an infrastructure
More informationImpact of Control Theory on QoS Adaptation in Distributed Middleware Systems
Impact of Control Theory on QoS Adaptation in Distributed Middleware Systems Baochun Li Electrical and Computer Engineering University of Toronto bli@eecg.toronto.edu Klara Nahrstedt Department of Computer
More informationPerformance Testing at Scale
Performance Testing at Scale An overview of performance testing at NetApp. Shaun Dunning shaun.dunning@netapp.com 1 Outline Performance Engineering responsibilities How we protect performance Overview
More informationKeynote Mobile Device Perspective
PRODUCT BROCHURE Keynote Mobile Device Perspective Keynote Mobile Device Perspective is a single platform for monitoring and troubleshooting mobile apps on real smartphones connected to live networks in
More informationThe Microsoft Windows Hypervisor High Level Architecture
The Microsoft Windows Hypervisor High Level Architecture September 21, 2007 Abstract The Microsoft Windows hypervisor brings new virtualization capabilities to the Windows Server operating system. Its
More informationMulticoreWare. Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan
1 MulticoreWare Global Company, 250+ employees HQ = Sunnyvale, CA Other locations: US, China, India, Taiwan Focused on Heterogeneous Computing Multiple verticals spawned from core competency Machine Learning
More informationICRI-CI Retreat Architecture track
ICRI-CI Retreat Architecture track Uri Weiser June 5 th 2015 - Funnel: Memory Traffic Reduction for Big Data & Machine Learning (Uri) - Accelerators for Big Data & Machine Learning (Ran) - Machine Learning
More informationCharacterizing Task Usage Shapes in Google s Compute Clusters
Characterizing Task Usage Shapes in Google s Compute Clusters Qi Zhang 1, Joseph L. Hellerstein 2, Raouf Boutaba 1 1 University of Waterloo, 2 Google Inc. Introduction Cloud computing is becoming a key
More informationIntroduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software
GPU Computing Numerical Simulation - from Models to Software Andreas Barthels JASS 2009, Course 2, St. Petersburg, Russia Prof. Dr. Sergey Y. Slavyanov St. Petersburg State University Prof. Dr. Thomas
More informationEfficient Load Balancing using VM Migration by QEMU-KVM
International Journal of Computer Science and Telecommunications [Volume 5, Issue 8, August 2014] 49 ISSN 2047-3338 Efficient Load Balancing using VM Migration by QEMU-KVM Sharang Telkikar 1, Shreyas Talele
More informationMaximizing Hadoop Performance and Storage Capacity with AltraHD TM
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created
More informationIntroduction to GPU Architecture
Introduction to GPU Architecture Ofer Rosenberg, PMTS SW, OpenCL Dev. Team AMD Based on From Shader Code to a Teraflop: How GPU Shader Cores Work, By Kayvon Fatahalian, Stanford University Content 1. Three
More informationCS231M Project Report - Automated Real-Time Face Tracking and Blending
CS231M Project Report - Automated Real-Time Face Tracking and Blending Steven Lee, slee2010@stanford.edu June 6, 2015 1 Introduction Summary statement: The goal of this project is to create an Android
More informationHow To Understand And Understand An Operating System In C Programming
ELEC 377 Operating Systems Thomas R. Dean Instructor Tom Dean Office:! WLH 421 Email:! tom.dean@queensu.ca Hours:! Wed 14:30 16:00 (Tentative)! and by appointment! 6 years industrial experience ECE Rep
More informationBlack-box Performance Models for Virtualized Web. Danilo Ardagna, Mara Tanelli, Marco Lovera, Li Zhang ardagna@elet.polimi.it
Black-box Performance Models for Virtualized Web Service Applications Danilo Ardagna, Mara Tanelli, Marco Lovera, Li Zhang ardagna@elet.polimi.it Reference scenario 2 Virtualization, proposed in early
More informationWriting Applications for the GPU Using the RapidMind Development Platform
Writing Applications for the GPU Using the RapidMind Development Platform Contents Introduction... 1 Graphics Processing Units... 1 RapidMind Development Platform... 2 Writing RapidMind Enabled Applications...
More informationA Scalable VISC Processor Platform for Modern Client and Cloud Workloads
A Scalable VISC Processor Platform for Modern Client and Cloud Workloads Mohammad Abdallah Founder, President and CTO Soft Machines Linley Processor Conference October 7, 2015 Agenda Soft Machines Background
More informationLecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.
Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide
More informationE6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices
E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,
More informationImproving the performance of data servers on multicore architectures. Fabien Gaud
Improving the performance of data servers on multicore architectures Fabien Gaud Grenoble University Advisors: Jean-Bernard Stefani, Renaud Lachaize and Vivien Quéma Sardes (INRIA/LIG) December 2, 2010
More informationMeasuring Cache and Memory Latency and CPU to Memory Bandwidth
White Paper Joshua Ruggiero Computer Systems Engineer Intel Corporation Measuring Cache and Memory Latency and CPU to Memory Bandwidth For use with Intel Architecture December 2008 1 321074 Executive Summary
More informationGPU File System Encryption Kartik Kulkarni and Eugene Linkov
GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationGraySort on Apache Spark by Databricks
GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner
More informationUsing VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems
Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems Applied Technology Abstract By migrating VMware virtual machines from one physical environment to another, VMware VMotion can
More informationEvaluation Methodology of Converged Cloud Environments
Krzysztof Zieliński Marcin Jarząb Sławomir Zieliński Karol Grzegorczyk Maciej Malawski Mariusz Zyśk Evaluation Methodology of Converged Cloud Environments Cloud Computing Cloud Computing enables convenient,
More informationIMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications
Open System Laboratory of University of Illinois at Urbana Champaign presents: Outline: IMCM: A Flexible Fine-Grained Adaptive Framework for Parallel Mobile Hybrid Cloud Applications A Fine-Grained Adaptive
More informationReal-Time Operating Systems for MPSoCs
Real-Time Operating Systems for MPSoCs Hiroyuki Tomiyama Graduate School of Information Science Nagoya University http://member.acm.org/~hiroyuki MPSoC 2009 1 Contributors Hiroaki Takada Director and Professor
More informationIntel DPDK Boosts Server Appliance Performance White Paper
Intel DPDK Boosts Server Appliance Performance Intel DPDK Boosts Server Appliance Performance Introduction As network speeds increase to 40G and above, both in the enterprise and data center, the bottlenecks
More informationImplementing an In-Service, Non- Intrusive Measurement Device in Telecommunication Networks Using the TMS320C31
Disclaimer: This document was part of the First European DSP Education and Research Conference. It may have been written by someone whose native language is not English. TI assumes no liability for the
More informationPERFORMANCE TUNING ORACLE RAC ON LINUX
PERFORMANCE TUNING ORACLE RAC ON LINUX By: Edward Whalen Performance Tuning Corporation INTRODUCTION Performance tuning is an integral part of the maintenance and administration of the Oracle database
More informationfind model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1
Monitors Monitor: A tool used to observe the activities on a system. Usage: A system programmer may use a monitor to improve software performance. Find frequently used segments of the software. A systems
More informationOperating Systems 4 th Class
Operating Systems 4 th Class Lecture 1 Operating Systems Operating systems are essential part of any computer system. Therefore, a course in operating systems is an essential part of any computer science
More informationEnsuring Collective Availability in Volatile Resource Pools via Forecasting
Ensuring Collective Availability in Volatile Resource Pools via Forecasting Artur Andrzejak andrzejak[at]zib.de Derrick Kondo David P. Anderson Zuse Institute Berlin (ZIB) INRIA UC Berkeley Motivation
More informationSoftware and the Concurrency Revolution
Software and the Concurrency Revolution A: The world s fastest supercomputer, with up to 4 processors, 128MB RAM, 942 MFLOPS (peak). 2 Q: What is a 1984 Cray X-MP? (Or a fractional 2005 vintage Xbox )
More informationSecure Containers. Jan 2015 www.imgtec.com. Imagination Technologies HGI Dec, 2014 p1
Secure Containers Jan 2015 www.imgtec.com Imagination Technologies HGI Dec, 2014 p1 What are we protecting? Sensitive assets belonging to the user and the service provider Network Monitor unauthorized
More informationExperimental Evaluation of Distributed Middleware with a Virtualized Java Environment
Experimental Evaluation of Distributed Middleware with a Virtualized Java Environment Nuno A. Carvalho, João Bordalo, Filipe Campos and José Pereira HASLab / INESC TEC Universidade do Minho MW4SOC 11 December
More informationStep by Step Guide To vstorage Backup Server (Proxy) Sizing
Tivoli Storage Manager for Virtual Environments V6.3 Step by Step Guide To vstorage Backup Server (Proxy) Sizing 12 September 2012 1.1 Author: Dan Wolfe, Tivoli Software Advanced Technology Page 1 of 18
More informationIntroduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
More informationDELL s Oracle Database Advisor
DELL s Oracle Database Advisor Underlying Methodology A Dell Technical White Paper Database Solutions Engineering By Roger Lopez Phani MV Dell Product Group January 2010 THIS WHITE PAPER IS FOR INFORMATIONAL
More informationCS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015
CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015 1. Goals and Overview 1. In this MP you will design a Dynamic Load Balancer architecture for a Distributed System 2. You will
More information@IJMTER-2015, All rights Reserved 355
e-issn: 2349-9745 p-issn: 2393-8161 Scientific Journal Impact Factor (SJIF): 1.711 International Journal of Modern Trends in Engineering and Research www.ijmter.com A Model for load balancing for the Public
More informationIoT: Smart Vision Leads The Way
IoT: Smart Vision Leads The Way Peter McGuinness Multimedia Technology Marketing www.imgtec.com IoT is changing from amorphous to concrete: Imagination Technologies US Summit May 2015 2 IoT is changing
More informationtheguard! ApplicationManager System Windows Data Collector
theguard! ApplicationManager System Windows Data Collector Status: 10/9/2008 Introduction... 3 The Performance Features of the ApplicationManager Data Collector for Microsoft Windows Server... 3 Overview
More informationGo Faster - Preprocessing Using FPGA, CPU, GPU. Dipl.-Ing. (FH) Bjoern Rudde Image Acquisition Development STEMMER IMAGING
Go Faster - Preprocessing Using FPGA, CPU, GPU Dipl.-Ing. (FH) Bjoern Rudde Image Acquisition Development STEMMER IMAGING WHO ARE STEMMER IMAGING? STEMMER IMAGING is: Europe's leading independent provider
More informationA bachelor of science degree in electrical engineering with a cumulative undergraduate GPA of at least 3.0 on a 4.0 scale
What is the University of Florida EDGE Program? EDGE enables engineering professional, military members, and students worldwide to participate in courses, certificates, and degree programs from the UF
More informationIntel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family
Intel Ethernet Switch Load Balancing System Design Using Advanced Features in Intel Ethernet Switch Family White Paper June, 2008 Legal INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL
More informationMulti-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
More informationThe High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices
WS on Models, Algorithms and Methodologies for Hierarchical Parallelism in new HPC Systems The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices
More informationRecent Advances in Periscope for Performance Analysis and Tuning
Recent Advances in Periscope for Performance Analysis and Tuning Isaias Compres, Michael Firbach, Michael Gerndt Robert Mijakovic, Yury Oleynik, Ventsislav Petkov Technische Universität München Yury Oleynik,
More informationHow To Test For Performance And Scalability On A Server With A Multi-Core Computer (For A Large Server)
Scalability Results Select the right hardware configuration for your organization to optimize performance Table of Contents Introduction... 1 Scalability... 2 Definition... 2 CPU and Memory Usage... 2
More informationRadeon HD 2900 and Geometry Generation. Michael Doggett
Radeon HD 2900 and Geometry Generation Michael Doggett September 11, 2007 Overview Introduction to 3D Graphics Radeon 2900 Starting Point Requirements Top level Pipeline Blocks from top to bottom Command
More informationVisualization à la Unix TM
Visualization à la Unix TM Hans-Peter Bischof (hpb [at] cs.rit.edu) Department of Computer Science Golisano College of Computing and Information Sciences Rochester Institute of Technology One Lomb Memorial
More information