The Methodology of Application Development for Hybrid Architectures
|
|
- Luke Golden
- 7 years ago
- Views:
Transcription
1 Computer Technology and Application 4 (2013) D DAVID PUBLISHING The Methodology of Application Development for Hybrid Architectures Vladimir Orekhov, Alexander Bogdanov and Vladimir Gaiduchok Department of Computer Systems Engineering and Informatics, Saint-Petersburg State Electrotechnical University LETI, Saint-Petersburg , Russia Received: September 12, 2013 / Accepted: September 30, 2013 / Published: October 25, Abstract: This paper provides an overview of the main recommendations and approaches of the methodology on parallel computation application development for hybrid structures. This methodology was developed within the master s thesis project Optimization of complex tasks computation on hybrid distributed computational structures accomplished by Orekhov during which the main research objective was the determination of patterns of the behavior of scaling efficiency and other parameters which define performance of different algorithms implementations executed on hybrid distributed computational structures. Major outcomes and dependencies obtained within the master s thesis project were formed into a methodology which covers the problems of applications based on parallel computations and describes the process of its development in details, offering easy ways of avoiding potentially crucial problems. The paper is backed by the real-life examples such as clustering algorithms instead of artificial benchmarks. Key words: Hybrid architectures, parallel computing, simulation modeling. 1. Introduction The research consisted of two main phases. The preliminary phase included serial execution of a number of algorithms and parallelization of these algorithms on an elementary hybrid system (one CPU and one OpenCL enabled GPU). This phase aimed at researching the ways of efficient parallelization of algorithms implementations using GPGPU paradigm and OpenCL. During the preliminary phase the main monitored parameter was the speedup comparing to the serial execution. The main phase focused on the research of scaling efficiency parameter and the algorithms from previous phase were executed on the hybrid cluster of Peterhof Telecommunication Center. During this phase numerous experiments were conducted and approaches were tested. Such dependencies as the dependency of Corresponding author: Vladimir Orekhov, M.Sc., research fields: software engineering. orekhov.volodya@gmail.com. scaling efficiency from task difficulty and input size were gathered and analyzed and this in-depth analysis formed the methodology described in this paper. This paper consists of three major sections: Section 2 covers the recommendations concerning preliminary algorithm analysis and simulation based on the simulation performance modeling approach; Section 3 describes the approach of optimal scaling efficiency determination; Section 4 covers the recommendations upon the target computational system choice. 2. Preliminary Algorithm Analysis and Task Simulation The following two sections (2 and 3) cover the problem of optimizing algorithms implementations in order to achieve high performance on a given system with certain constraints. Gebali defines it as a route from Parallel Computer Space to Algorithm Space [1]. Before the actual development of an application based on parallel computing, developers should go through several steps, which are illustrated in Fig. 1.
2 544 The Methodology of Application Development for Hybrid Architectures The steps shown in Fig. 1 are based on the simulation performancee modeling and will be discussed in more detail in the following subsections. 2.1 Execution Scheme Design The execution scheme design step includes the design of a general execution scheme, which will, on a high level of abstraction, show the main elements: blocks of code, executed in parallel, blocks of code, executed in series and data transfers between the host and the computational devices. An example of an execution scheme for the K-Means clustering algorithm is represented in Fig. 2. The process of the execution plan design is iterative; in each iteration, developers should try to maximize two main parameters: parallelization coefficient and the number of stand-alone computations (meaning the amount of computations, during which the host computational devices channel is not used to transfer data). Although these two parameters seem to have the same meaning, it is very important to understandd the difference between them. The parallelization coefficient shows the general amount of computations, which are executed in parallel: where, the amount of computations, which are executed in parallel; C the overall amount Fig. 1 The scheme of preliminary algorithm analysiss and task simulation. Fig. 2 Sample execution scheme for the K-means clustering algorithm. of computations (both parallel and serial). The number of stand-alone computations, however, is a more specific parameter, and basically means the number of computations, during the execution of which the host computational devices channel is not used to transfer data. This parameter takes into consideration the data transfer costs and the maximization of this parameter refers to the maximization of task execution solely on computational devices. 2.2 Task Simulation The task simulation step includes the design of a task on the basis of the designed execution scheme, whichh willl simulate the behavior of the target task. As far as on this step we focus on time complexity, the actual implementation of the algorithm s steps is not important; what is importantt is to simulate the same amount of computations via something basic like simple arithmetic operations. In this simulated task, all the computations should be replaced with the equivalent number of simple operations. This will allow achieving a model with the same computational difficulty as in the target task, notably cutting the development time. 2.3 Executing the Simulated Task During the execution step, the simulated task should
3 The Methodology of Application Development for Hybrid Architectures 545 be executed on the target computational system. After that, it is time to analyze the execution results. The results can include such base parameters as Execution time; Execution speed: Although execution speed is usually measured in seconds [1], if the number of iterations is not constant (as it is for the clustering algorithms) it needs to be measured in iterations per second. These base parameters show the performance of a particular task, executed on a particular computational system. At the same time, it appears that knowing a simple fact that X executes faster, than Y is not enough we need a more detailed comparison of X and Y to organize a proper analysis. Derivative parameters can be used for that purpose: Speedup factor which will be called scaling efficiency in this article. 1 1 where, T(N) execution time for N computational devices; V(N) execution speed for N computational devices. Coefficient is showing the proximity to the ideal scaling efficiency. One of the algorithms examined during the research was K-means clustering algorithm which is an iterative algorithm: in each iteration, the nodes act entirely independently of each other, but after each iteration there is a significant communication overhead [2]. For the research purposes the communication overhead was minimized by maximizing the amount of work to do in each iteration (number of clusters was increased up to 16,384). The scaling efficiency chart for the K-Means clustering algorithm can illustrate the meaning of the scaling efficiency parameter and is represented in Fig. 3. In Fig. 3, the real scaling repeats the speedup curve of Amdahl s law [3]. If the results are satisfactory, the execution scheme is proven to be appropriate and the application should be developed based on it. If the results are not satisfactory, either the execution scheme needs to be redesigned or the target computational system has to be changed. Quite frequently it is impossible to execute the simulated task on the target computational system because it does not exist, for example, all parts for the hybrid cluster are bought but not yet set to work, or is not available for simulation purposes. In that case the simulated task may be executed on the closest available computational system (here the closest corresponds to the structure and size of the system). Martinez suggests that in this case a combination of analytical modeling, simulation and empirical experimentation is required [4]. In terms of this paper analytical modeling is related to the execution scheme design step, simulation is related to the task simulation and to the execution of simulated task steps. Empirical experimentation covers the whole iterative process of preliminary algorithm analysis and task simulation described in Fig Determination of Scaling Efficiency When we speak about effective scaling in terms of hybrid clusters, one of the main trends is the following: Scaling can be efficient up to some critical number of computational devices, and then it becomes inefficient. Gebali mentions [1] that for large values of computational devices, the speedup is approximated by This number of computational devices is a crucially important constant as it describes the relations between the specific task (usually an implementation of some algorithm) and the specific computational system. However, it can take much effort to detect this number and in most cases it is unnecessary. All we need to know is: The upper bound for the computational devices number, which are supposed to be used for this task; Whether this number is smaller than the critical number or not, if we lose scaling efficiency?
4 546 The Methodology of Application Development for Hybrid Architectures Scaling efficiency Ideal scaling GPU quantity Real scaling for clusters Fig. 3 Scaling efficiency analysis. It is important to know these parameters as early as possible to avoid the problems of inefficient scaling. Sometimes the upper bound of scaling is known in the very beginning of the development process. Then the goal is to determine whether the scaling is effective for this number of computational devices, for a particular algorithm s implementation and on a particular computational system. In these terms, if scaling for an exact upper bound of computational devices is not effective (see Case 2 described below), either the execution scheme should be redesigned or the target computational system should be changed, and this approach absolutely fits the scheme, illustrated in Fig. 1. The deeper understanding of scaling efficiency can be obtained by simulating and testing tasks on the target systems and building scaling efficiency charts as shown in Fig. 3. This chart basically shows how much of the hardware potential is used when scaling on more computational devices. For instance, Fig. 3 shows a scaling efficiency of nearly 100% on 2 GPUs and a scaling efficiency of only about 50% on 8 GPUs. It is important to understand the bottom bound of scaling efficiency, after which we do not want to continue the scaling process (it may be, for instance, 20%). Several real-life cases will help to better understand the subject: Case 1: describes the necessity of knowing the upper bound of scaling before the beginning of the development process. The application was simulated, developed and tested for an upper bound of 20 computational devices. After a while, the target hybrid cluster grows wider and receives extra 10 computational devices, and then the application is executed on a total of 30 computational devices. In worst-case scenario, on 30 devices it will run slower, than on 20 devices, which is, obviously, an extremely inefficient use of computational capabilities. Case 2: describes how the scaling efficiency problem can be detected and eliminated in the early stage. The upper bound is known to be 40 devices but the results of an iterative execution have clearly shown that the task mapped on 16 devices executes faster than the task mapped on 32 devices. Therefore, this combination of an algorithm s implementation and a target computational system are not able to provide an effective scalability up to 40 devices. 4. Optimal Target System Choice The following section covers the backward problem comparing to the previous problem discussed in Sections 2 and 3: in this case the actual algorithm is given and the task is to find an appropriate
5 The Methodology of Application Development for Hybrid Architectures 547 computational system for this algorithm. An important question during the parallel application development process is the choice of a target computational system (meaning hardware platform). The first thing to think about is whether the application can be scaled efficiently. If it can be scaled efficiently, the target computational platform may be a hybrid computational system containing several computational devices. Both scientists and developers often set quite ambitious goals implying efficient scaling on up to hundreds of computational devices. In this case, poor scalability is fatal. At the same time, if the goal is not that ambitious (for instance, achieving a 40x speedup), it is good to remember the following trend: inefficient scaling doesn t mean, that the task can t be effectively executed in parallel. If the task cannot be scaled efficiently, it can always be executed on one computational device. In 2013 practically every PC is a simplest hybrid system with a CPU representing host device and GPU representing computational device and can cope with this task. When we speak about parallel computations on hybrid systems, it is important to choose between AMD OpenCL and NVidia CUDA technologies. This choice heavily affects the target computational system choice. Here, one of the paradigms should be chosen. OpenCL is a free standard, supported by many hardware developers, including AMD, Intel and NVidia. The basic idea behind OpenCL is to create a unified framework for application development on hybrid structures. OpenCL hides all the implementation details and represents complex hybrid systems as a single system with a number of computational devices [5, 6]. CUDA is focused on the massively parallel computing around NVidia GPUs solely; as they choose such a narrow area, obviously, CUDA shows a better performance on NVidia GPUs comparing to OpenCL. If NVidia CUDA is used as the dominant technology, no other options than using a GPU cluster, containing numerous NVidia GPUs. If we choose OpenCL, practically every hybrid system can be a target system. 5. Conclusions The described methodology covers the development process of a parallel application including preliminary algorithm analysis, task simulation, scalability questions, dominant technology and target system choice. The methodology is based on the in-depth studies of algorithm implementation for hybrid distributed structures, one of the center aspects of which was scaling efficiency. All the main principles are covered by real-life examples. Acknowledgments Work partially supported by the Russian Foundation for Basic Research, project no a and St. Petersburg State University, project no References [1] F. Gebali, Algorithms and Parallel Computing, Hoboken, NJ: Wiley, [2] N. Matloff, Programming on Parallel Machines: GPU, Multicore, Clusters and More, CA: University of California, [3] G.M. Amdahl, Validity of the single-processor approach to achieving large scale computing capabilities, AFIPS Conference Proceedings 30 (1967) [4] D.R. Martinez, V. Blanco, M. Boullon, J.C. Cabaleiro, T.F. Pena, Analytical performance models of parallel programs in clusters, Parallel Computing: Architectures, Algorithms and Applications 15 (2008) [5] J. Kowalik, T. Puzniakowski, Using OpenCL: Programming Massively Parallel Computers, Amsterdam, IOS Press, Netherlands, [6] A. Munshi, The OpenCL Specification, Ver.: 1.2, Khronos OpenCL Working Group, 2012.
Enhancing Cloud-based Servers by GPU/CPU Virtualization Management
Enhancing Cloud-based Servers by GPU/CPU Virtualiz Management Tin-Yu Wu 1, Wei-Tsong Lee 2, Chien-Yu Duan 2 Department of Computer Science and Inform Engineering, Nal Ilan University, Taiwan, ROC 1 Department
More informationResearch on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2
Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data
More informationGPGPU accelerated Computational Fluid Dynamics
t e c h n i s c h e u n i v e r s i t ä t b r a u n s c h w e i g Carl-Friedrich Gauß Faculty GPGPU accelerated Computational Fluid Dynamics 5th GACM Colloquium on Computational Mechanics Hamburg Institute
More informationCellular Computing on a Linux Cluster
Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results
More informationultra fast SOM using CUDA
ultra fast SOM using CUDA SOM (Self-Organizing Map) is one of the most popular artificial neural network algorithms in the unsupervised learning category. Sijo Mathew Preetha Joy Sibi Rajendra Manoj A
More informationGPU File System Encryption Kartik Kulkarni and Eugene Linkov
GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through
More informationACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING
ACCELERATING COMMERCIAL LINEAR DYNAMIC AND Vladimir Belsky Director of Solver Development* Luis Crivelli Director of Solver Development* Matt Dunbar Chief Architect* Mikhail Belyi Development Group Manager*
More informationHIGH PERFORMANCE CONSULTING COURSE OFFERINGS
Performance 1(6) HIGH PERFORMANCE CONSULTING COURSE OFFERINGS LEARN TO TAKE ADVANTAGE OF POWERFUL GPU BASED ACCELERATOR TECHNOLOGY TODAY 2006 2013 Nvidia GPUs Intel CPUs CONTENTS Acronyms and Terminology...
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware
More informationModern Platform for Parallel Algorithms Testing: Java on Intel Xeon Phi
I.J. Information Technology and Computer Science, 2015, 09, 8-14 Published Online August 2015 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijitcs.2015.09.02 Modern Platform for Parallel Algorithms
More informationStream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
More informationInternational Journal of Engineering Research & Management Technology
International Journal of Engineering Research & Management Technology March- 2015 Volume 2, Issue-2 Survey paper on cloud computing with load balancing policy Anant Gaur, Kush Garg Department of CSE SRM
More informationInternational Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 6, June 2015 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationInteractive comment on A parallelization scheme to simulate reactive transport in the subsurface environment with OGS#IPhreeqc by W. He et al.
Geosci. Model Dev. Discuss., 8, C1166 C1176, 2015 www.geosci-model-dev-discuss.net/8/c1166/2015/ Author(s) 2015. This work is distributed under the Creative Commons Attribute 3.0 License. Geoscientific
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationGPUs for Scientific Computing
GPUs for Scientific Computing p. 1/16 GPUs for Scientific Computing Mike Giles mike.giles@maths.ox.ac.uk Oxford-Man Institute of Quantitative Finance Oxford University Mathematical Institute Oxford e-research
More informationNVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist
NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get
More informationRelational Databases in the Cloud
Contact Information: February 2011 zimory scale White Paper Relational Databases in the Cloud Target audience CIO/CTOs/Architects with medium to large IT installations looking to reduce IT costs by creating
More informationHigh Performance Matrix Inversion with Several GPUs
High Performance Matrix Inversion on a Multi-core Platform with Several GPUs Pablo Ezzatti 1, Enrique S. Quintana-Ortí 2 and Alfredo Remón 2 1 Centro de Cálculo-Instituto de Computación, Univ. de la República
More informationHigh Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates
High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of
More informationThere are a number of factors that increase the risk of performance problems in complex computer and software systems, such as e-commerce systems.
ASSURING PERFORMANCE IN E-COMMERCE SYSTEMS Dr. John Murphy Abstract Performance Assurance is a methodology that, when applied during the design and development cycle, will greatly increase the chances
More informationOverview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket
More informationLecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.
Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide
More informationST810 Advanced Computing
ST810 Advanced Computing Lecture 17: Parallel computing part I Eric B. Laber Hua Zhou Department of Statistics North Carolina State University Mar 13, 2013 Outline computing Hardware computing overview
More informationDesign and Implementation of the Heterogeneous Multikernel Operating System
223 Design and Implementation of the Heterogeneous Multikernel Operating System Yauhen KLIMIANKOU Department of Computer Systems and Networks, Belarusian State University of Informatics and Radioelectronics,
More informationA survey on platforms for big data analytics
Singh and Reddy Journal of Big Data 2014, 1:8 SURVEY PAPER Open Access A survey on platforms for big data analytics Dilpreet Singh and Chandan K Reddy * * Correspondence: reddy@cs.wayne.edu Department
More informationHPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware
More informationQuiz for Chapter 1 Computer Abstractions and Technology 3.10
Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [15 points] Consider two different implementations,
More informationTurbomachinery CFD on many-core platforms experiences and strategies
Turbomachinery CFD on many-core platforms experiences and strategies Graham Pullan Whittle Laboratory, Department of Engineering, University of Cambridge MUSAF Colloquium, CERFACS, Toulouse September 27-29
More informationon an system with an infinite number of processors. Calculate the speedup of
1. Amdahl s law Three enhancements with the following speedups are proposed for a new architecture: Speedup1 = 30 Speedup2 = 20 Speedup3 = 10 Only one enhancement is usable at a time. a) If enhancements
More informationA GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS
A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS SUDHAKARAN.G APCF, AERO, VSSC, ISRO 914712564742 g_suhakaran@vssc.gov.in THOMAS.C.BABU APCF, AERO, VSSC, ISRO 914712565833
More informationGPGPU acceleration in OpenFOAM
Carl-Friedrich Gauß Faculty GPGPU acceleration in OpenFOAM Northern germany OpenFoam User meeting Braunschweig Institute of Technology Thorsten Grahs Institute of Scientific Computing/move-csc 2nd October
More informationA Lab Course on Computer Architecture
A Lab Course on Computer Architecture Pedro López José Duato Depto. de Informática de Sistemas y Computadores Facultad de Informática Universidad Politécnica de Valencia Camino de Vera s/n, 46071 - Valencia,
More informationA Cloud Computing Approach for Big DInSAR Data Processing
A Cloud Computing Approach for Big DInSAR Data Processing through the P-SBAS Algorithm Zinno I. 1, Elefante S. 1, Mossucca L. 2, De Luca C. 1,3, Manunta M. 1, Terzo O. 2, Lanari R. 1, Casu F. 1 (1) IREA
More informationA Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster
Acta Technica Jaurinensis Vol. 3. No. 1. 010 A Simultaneous Solution for General Linear Equations on a Ring or Hierarchical Cluster G. Molnárka, N. Varjasi Széchenyi István University Győr, Hungary, H-906
More informationEnergy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
More informationExploiting GPU Hardware Saturation for Fast Compiler Optimization
Exploiting GPU Hardware Saturation for Fast Compiler Optimization Alberto Magni School of Informatics University of Edinburgh United Kingdom a.magni@sms.ed.ac.uk Christophe Dubach School of Informatics
More informationHPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
More informationDesign and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms
Design and Optimization of OpenFOAM-based CFD Applications for Hybrid and Heterogeneous HPC Platforms Amani AlOnazi, David E. Keyes, Alexey Lastovetsky, Vladimir Rychkov Extreme Computing Research Center,
More informationDirect GPU/FPGA Communication Via PCI Express
Direct GPU/FPGA Communication Via PCI Express Ray Bittner, Erik Ruf Microsoft Research Redmond, USA {raybit,erikruf}@microsoft.com Abstract Parallel processing has hit mainstream computing in the form
More informationThe Value of High-Performance Computing for Simulation
White Paper The Value of High-Performance Computing for Simulation High-performance computing (HPC) is an enormous part of the present and future of engineering simulation. HPC allows best-in-class companies
More informationChoosing a Computer for Running SLX, P3D, and P5
Choosing a Computer for Running SLX, P3D, and P5 This paper is based on my experience purchasing a new laptop in January, 2010. I ll lead you through my selection criteria and point you to some on-line
More informationEfficient Scheduling Of On-line Services in Cloud Computing Based on Task Migration
Efficient Scheduling Of On-line Services in Cloud Computing Based on Task Migration 1 Harish H G, 2 Dr. R Girisha 1 PG Student, 2 Professor, Department of CSE, PESCE Mandya (An Autonomous Institution under
More informationWhite Paper. Managed IT Services as a Business Solution
White Paper Managed IT Services as a Business Solution 1 TABLE OF CONTENTS 2 Introduction... 2 3 The Need for Expert IT Management... 3 4 Managed Services Explained... 4 5 Managed Services: Key Benefits...
More informationParallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
More informationIntroducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
More informationLoad Balancing on a Grid Using Data Characteristics
Load Balancing on a Grid Using Data Characteristics Jonathan White and Dale R. Thompson Computer Science and Computer Engineering Department University of Arkansas Fayetteville, AR 72701, USA {jlw09, drt}@uark.edu
More informationOpenCL Programming for the CUDA Architecture. Version 2.3
OpenCL Programming for the CUDA Architecture Version 2.3 8/31/2009 In general, there are multiple ways of implementing a given algorithm in OpenCL and these multiple implementations can have vastly different
More informationPARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology)
PARALLEL JAVASCRIPT Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology) JAVASCRIPT Not connected with Java Scheme and self (dressed in c clothing) Lots of design errors (like automatic semicolon
More informationGPU for Scientific Computing. -Ali Saleh
1 GPU for Scientific Computing -Ali Saleh Contents Introduction What is GPU GPU for Scientific Computing K-Means Clustering K-nearest Neighbours When to use GPU and when not Commercial Programming GPU
More informationLow-Power Amdahl-Balanced Blades for Data-Intensive Computing
Thanks to NVIDIA, Microsoft External Research, NSF, Moore Foundation, OCZ Technology Low-Power Amdahl-Balanced Blades for Data-Intensive Computing Alex Szalay, Andreas Terzis, Alainna White, Howie Huang,
More informationLarge Scale Simulation on Clusters using COMSOL 4.2
Large Scale Simulation on Clusters using COMSOL 4.2 Darrell W. Pepper 1 Xiuling Wang 2 Steven Senator 3 Joseph Lombardo 4 David Carrington 5 with David Kan and Ed Fontes 6 1 DVP-USAFA-UNLV, 2 Purdue-Calumet,
More informationParallel Firewalls on General-Purpose Graphics Processing Units
Parallel Firewalls on General-Purpose Graphics Processing Units Manoj Singh Gaur and Vijay Laxmi Kamal Chandra Reddy, Ankit Tharwani, Ch.Vamshi Krishna, Lakshminarayanan.V Department of Computer Engineering
More informationEfficient Parallel Graph Exploration on Multi-Core CPU and GPU
Efficient Parallel Graph Exploration on Multi-Core CPU and GPU Pervasive Parallelism Laboratory Stanford University Sungpack Hong, Tayo Oguntebi, and Kunle Olukotun Graph and its Applications Graph Fundamental
More informationAn Application of Hadoop and Horizontal Scaling to Conjunction Assessment. Mike Prausa The MITRE Corporation Norman Facas The MITRE Corporation
An Application of Hadoop and Horizontal Scaling to Conjunction Assessment Mike Prausa The MITRE Corporation Norman Facas The MITRE Corporation ABSTRACT This paper examines a horizontal scaling approach
More informationPerformance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
More informationParallel Computing with MATLAB
Parallel Computing with MATLAB Scott Benway Senior Account Manager Jiro Doke, Ph.D. Senior Application Engineer 2013 The MathWorks, Inc. 1 Acceleration Strategies Applied in MATLAB Approach Options Best
More informationRevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
More informationGraphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011
Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis
More informationPedraforca: ARM + GPU prototype
www.bsc.es Pedraforca: ARM + GPU prototype Filippo Mantovani Workshop on exascale and PRACE prototypes Barcelona, 20 May 2014 Overview Goals: Test the performance, scalability, and energy efficiency of
More informationHow to program efficient optimization algorithms on Graphics Processing Units - The Vehicle Routing Problem as a case study
How to program efficient optimization algorithms on Graphics Processing Units - The Vehicle Routing Problem as a case study Geir Hasle, Christian Schulz Department of, SINTEF ICT, Oslo, Norway Seminar
More informationClustering Billions of Data Points Using GPUs
Clustering Billions of Data Points Using GPUs Ren Wu ren.wu@hp.com Bin Zhang bin.zhang2@hp.com Meichun Hsu meichun.hsu@hp.com ABSTRACT In this paper, we report our research on using GPUs to accelerate
More informationHigh Performance GPGPU Computer for Embedded Systems
High Performance GPGPU Computer for Embedded Systems Author: Dan Mor, Aitech Product Manager September 2015 Contents 1. Introduction... 3 2. Existing Challenges in Modern Embedded Systems... 3 2.1. Not
More informationEvoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca
Evoluzione dell Infrastruttura di Calcolo e Data Analytics per la ricerca Carlo Cavazzoni CINECA Supercomputing Application & Innovation www.cineca.it 21 Aprile 2015 FERMI Name: Fermi Architecture: BlueGene/Q
More informationData Structures and Algorithms
Data Structures and Algorithms Computational Complexity Escola Politècnica Superior d Alcoi Universitat Politècnica de València Contents Introduction Resources consumptions: spatial and temporal cost Costs
More information18-742 Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two
age 1 18-742 Lecture 4 arallel rogramming II Spring 2005 rof. Babak Falsafi http://www.ece.cmu.edu/~ece742 write X Memory send X Memory read X Memory Slides developed in part by rofs. Adve, Falsafi, Hill,
More informationFour Keys to Successful Multicore Optimization for Machine Vision. White Paper
Four Keys to Successful Multicore Optimization for Machine Vision White Paper Optimizing a machine vision application for multicore PCs can be a complex process with unpredictable results. Developers need
More informationAccelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools
More informationParallel Scalable Algorithms- Performance Parameters
www.bsc.es Parallel Scalable Algorithms- Performance Parameters Vassil Alexandrov, ICREA - Barcelona Supercomputing Center, Spain Overview Sources of Overhead in Parallel Programs Performance Metrics for
More informationRecommended hardware system configurations for ANSYS users
Recommended hardware system configurations for ANSYS users The purpose of this document is to recommend system configurations that will deliver high performance for ANSYS users across the entire range
More informationSQL Server 2012 Performance White Paper
Published: April 2012 Applies to: SQL Server 2012 Copyright The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication.
More informationARMS Counterparty Credit Risk
ARMS Counterparty Credit Risk PFE - Potential Future Exposure & CVA - Credit Valuation Adjustment An introduction to the ARMS CCR module EXECUTIVE SUMMARY The ARMS CCR Module combines market proven financial
More informationElastic Application Platform for Market Data Real-Time Analytics. for E-Commerce
Elastic Application Platform for Market Data Real-Time Analytics Can you deliver real-time pricing, on high-speed market data, for real-time critical for E-Commerce decisions? Market Data Analytics applications
More informationParallel Image Processing with CUDA A case study with the Canny Edge Detection Filter
Parallel Image Processing with CUDA A case study with the Canny Edge Detection Filter Daniel Weingaertner Informatics Department Federal University of Paraná - Brazil Hochschule Regensburg 02.05.2011 Daniel
More informationParallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
More informationPerformance with the Oracle Database Cloud
An Oracle White Paper September 2012 Performance with the Oracle Database Cloud Multi-tenant architectures and resource sharing 1 Table of Contents Overview... 3 Performance and the Cloud... 4 Performance
More informationAchieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks
WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance
More informationCHAPTER 1 INTRODUCTION
CHAPTER 1 INTRODUCTION 1.1 Background The command over cloud computing infrastructure is increasing with the growing demands of IT infrastructure during the changed business scenario of the 21 st Century.
More informationFPGA area allocation for parallel C applications
1 FPGA area allocation for parallel C applications Vlad-Mihai Sima, Elena Moscu Panainte, Koen Bertels Computer Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University
More informationPyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms
PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms P. E. Vincent! Department of Aeronautics Imperial College London! 25 th March 2014 Overview Motivation Flux Reconstruction Many-Core
More informationOptimizing a 3D-FWT code in a cluster of CPUs+GPUs
Optimizing a 3D-FWT code in a cluster of CPUs+GPUs Gregorio Bernabé Javier Cuenca Domingo Giménez Universidad de Murcia Scientific Computing and Parallel Programming Group XXIX Simposium Nacional de la
More informationFormal Metrics for Large-Scale Parallel Performance
Formal Metrics for Large-Scale Parallel Performance Kenneth Moreland and Ron Oldfield Sandia National Laboratories, Albuquerque, NM 87185, USA {kmorel,raoldfi}@sandia.gov Abstract. Performance measurement
More informationTask Scheduling in Hadoop
Task Scheduling in Hadoop Sagar Mamdapure Munira Ginwala Neha Papat SAE,Kondhwa SAE,Kondhwa SAE,Kondhwa Abstract Hadoop is widely used for storing large datasets and processing them efficiently under distributed
More informationMaking Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association
Making Multicore Work and Measuring its Benefits Markus Levy, president EEMBC and Multicore Association Agenda Why Multicore? Standards and issues in the multicore community What is Multicore Association?
More informationThe Big Data methodology in computer vision systems
The Big Data methodology in computer vision systems Popov S.B. Samara State Aerospace University, Image Processing Systems Institute, Russian Academy of Sciences Abstract. I consider the advantages of
More informationGeneral Framework for an Iterative Solution of Ax b. Jacobi s Method
2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,
More informationRealizing the next step in storage/converged architectures
Realizing the next step in storage/converged architectures Imagine having the same data access and processing power of an entire Facebook like datacenter in a single rack of servers Flash Memory Summit
More informationHigh Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach
High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach Beniamino Di Martino, Antonio Esposito and Andrea Barbato Department of Industrial and Information Engineering Second University of Naples
More informationMeasuring Software Systems Scalability for Proactive Data Center Management
Measuring Software Systems Scalability for Proactive Data Center Management Nuno A. Carvalho and José Pereira Computer Science and Technology Center Universidade do Minho Braga, Portugal {nuno,jop}@di.uminho.pt
More informationParallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.
Parallel Computing: Strategies and Implications Dori Exterman CTO IncrediBuild. In this session we will discuss Multi-threaded vs. Multi-Process Choosing between Multi-Core or Multi- Threaded development
More informationApplying Data Analysis to Big Data Benchmarks. Jazmine Olinger
Applying Data Analysis to Big Data Benchmarks Jazmine Olinger Abstract This paper describes finding accurate and fast ways to simulate Big Data benchmarks. Specifically, using the currently existing simulation
More informationAn Implementation of Active Data Technology
White Paper by: Mario Morfin, PhD Terri Chu, MEng Stephen Chen, PhD Robby Burko, PhD Riad Hartani, PhD An Implementation of Active Data Technology October 2015 In this paper, we build the rationale for
More informationPerformance metrics for parallel systems
Performance metrics for parallel systems S.S. Kadam C-DAC, Pune sskadam@cdac.in C-DAC/SECG/2006 1 Purpose To determine best parallel algorithm Evaluate hardware platforms Examine the benefits from parallelism
More informationThe Key Technology Research of Virtual Laboratory based On Cloud Computing Ling Zhang
International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015) The Key Technology Research of Virtual Laboratory based On Cloud Computing Ling Zhang Nanjing Communications
More informationLoad Balancing using Potential Functions for Hierarchical Topologies
Acta Technica Jaurinensis Vol. 4. No. 4. 2012 Load Balancing using Potential Functions for Hierarchical Topologies Molnárka Győző, Varjasi Norbert Széchenyi István University Dept. of Machine Design and
More informationScheduling and Load Balancing in the Parallel ROOT Facility (PROOF)
Scheduling and Load Balancing in the Parallel ROOT Facility (PROOF) Gerardo Ganis CERN E-mail: Gerardo.Ganis@cern.ch CERN Institute of Informatics, University of Warsaw E-mail: Jan.Iwaszkiewicz@cern.ch
More informationHome Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks. October 20 th 2015
INF5063: Programming heterogeneous multi-core processors because the OS-course is just to easy! Home Exam 3: Distributed Video Encoding using Dolphin PCI Express Networks October 20 th 2015 Håkon Kvale
More informationChapter 6 Experiment Process
Chapter 6 Process ation is not simple; we have to prepare, conduct and analyze experiments properly. One of the main advantages of an experiment is the control of, for example, subjects, objects and instrumentation.
More informationApplying Parallel and Distributed Computing for Image Reconstruction in 3D Electrical Capacitance Tomography
AUTOMATYKA 2010 Tom 14 Zeszyt 3/2 Pawe³ Kapusta*, Micha³ Majchrowicz*, Robert Banasiak* Applying Parallel and Distributed Computing for Image Reconstruction in 3D Electrical Capacitance Tomography 1. Introduction
More information