BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky

Similar documents
BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky

HPC Wales Skills Academy Course Catalogue 2015

Introduction to GPU Programming Languages

GPUs for Scientific Computing

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

CUDA programming on NVIDIA GPUs

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

Part I Courses Syllabus

Parallel Algorithm Engineering

Next Generation GPU Architecture Code-named Fermi

The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.

The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

Climate-Weather Modeling Studies Using a Prototype Global Cloud-System Resolving Model

Turbomachinery CFD on many-core platforms experiences and strategies

Chapter 6 Load Balancing

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Petascale Visualization: Approaches and Initial Results

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

BLM 413E - Parallel Programming Lecture 3

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Overview. NVIDIA Quadro K5200 8GB Graphics J3G90AA

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

PARALLEL PROGRAMMING

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Technical Specifications

Introduction to DISC and Hadoop

Parallel Computing. Benson Muite. benson.

Introduction to Cloud Computing

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

Evaluation of CUDA Fortran for the CFD code Strukti

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

MPI and Hybrid Programming Models. William Gropp

High Performance Computing. Course Notes HPC Fundamentals

HIGH PERFORMANCE BIG DATA ANALYTICS

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

Multicore Parallel Computing with OpenMP

High Productivity Computing With Windows

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

Testing for Security

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Performance Analysis for GPU Accelerated Applications

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

Course Development of Programming for General-Purpose Multicore Processors

Data Center and Cloud Computing Market Landscape and Challenges

PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

Software Development around a Millisecond

Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber

5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model

Spring 2011 Prof. Hyesoon Kim

Advanced Operating Systems (M) Dr Colin Perkins School of Computing Science University of Glasgow

Implementation of Stereo Matching Using High Level Compiler for Parallel Computing Acceleration

An Introduction to Parallel Computing/ Programming

walberla: A software framework for CFD applications on Compute Cores

HPC with Multicore and GPUs

IBM DB2 Near-Line Storage Solution for SAP NetWeaver BW

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

INTEL PARALLEL STUDIO XE EVALUATION GUIDE

Disk Storage Shortfall

Introduction to Cloud Computing

Large-Scale Reservoir Simulation and Big Data Visualization

Overview of HPC Resources at Vanderbilt

Sourcery Overview & Virtual Machine Installation

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

TIBCO Runtime Agent Authentication API User s Guide. Software Release November 2012

FPGA-based Multithreading for In-Memory Hash Joins

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

10- High Performance Compu5ng

Introduction to Hybrid Programming

QuickSpecs. NVIDIA Quadro K1200 4GB Graphics INTRODUCTION PERFORMANCE AND FEATURES. Overview

How To Understand And Understand An Operating System In C Programming

Trends in High-Performance Computing for Power Grid Applications

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

DIABLO VALLEY COLLEGE CATALOG

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Debugging in Heterogeneous Environments with TotalView. ECMWF HPC Workshop 30 th October 2014

CSCI E 98: Managed Environments for the Execution of Programs

Transcription:

Solving the World s Toughest Computational Problems with Parallel Computing Alan Kaminsky

Solving the World s Toughest Computational Problems with Parallel Computing Alan Kaminsky Department of Computer Science B. Thomas Golisano College of Computing and Information Sciences Rochester Institute of Technology

ii Copyright 2015 by Alan Kaminsky. All rights reserved. ISBN 000-0-0000-0000-0 The book : Solving the World s Toughest Computational Problems with Parallel Computing is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA. To reduce costs, the hardcopy version is printed in black and white. For a fullcolor e-version, see http://www.cs.rit.edu/~ark/bcbd/. The program source files listed in this book are part of the Parallel Java 2 Library ( The Library ). The Library is copyright 2013 2015 by Alan Kaminsky. All rights reserved. The Library is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. The Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with The Library. If not, see http://www.gnu.org/licenses/. You can get the Parallel Java 2 Library at http://www.cs.rit.edu/~ark/pj2.shtml. Front cover image: The IBM Blue Gene/P supercomputer installation at the Argonne Leadership Angela Yang Computing Facility located in the Argonne National Laboratory, in Lemont, Illinois, USA. Courtesy of Argonne National Laboratory. http://commons.wikimedia.org/wiki/file:ibm_blue_gene_p_supercomputer.jpg Alan Kaminsky Professor Department of Computer Science B. Thomas Golisano College of Computing and Information Sciences Rochester Institute of Technology ark@cs.rit.edu http://www.cs.rit.edu/~ark/ August 2015 edition

Preface W ith the book, my goal is to teach you how to write parallel programs that take full advantage of the vast processing power of modern multicore computers, compute clusters, and graphics processing unit (GPU) accelerators. The book is free, Creative Commons licensed, and is available from my web site (http://www.cs.rit.edu/~ark/bcbd/). I m not going to teach you parallel programming using popular parallel libraries like MPI, OpenMP, and OpenCL. (If you re interested in learning those, plenty of other books are available.) Why? Two reasons: I prefer to program in Java. The aforementioned libraries do not, and in my belief never will, support Java. In my experience, teaching and learning parallel programming with the aforementioned libraries is more difficult than with Java. Instead, I m going to use my Parallel Java 2 Library (PJ2) in this book. PJ2 is free, GNU GPL licensed software available from my web site (http:// www.cs.rit.edu/~ark/pj2.shtml). You can download the complete source files, compiled class files, and Javadoc documentation. PJ2 requires Java Development Kit (JDK) 1.7 or higher. Installation instructions are included in the Javadoc. PJ2 is suitable both for teaching and learning parallel programming and for real-world parallel program development. I use PJ2 and its predecessor, the Parallel Java Library (PJ), in my cryptography research. Others have used PJ to do page rank calculations, ocean ecosystem modeling, salmon population modeling and analysis, medication scheduling for patients in long term care facilities, three-dimensional complex-valued fast Fourier transforms for electronic structure analysis and X-ray crystallography, and Monte Carlo simulation of electricity and gas markets. PJ was also incorporated into the IQM open source Java image processing application. I am happy to answer general questions about PJ2, receive bug reports, and entertain requests for additional features. Please contact me by email at ark@cs.rit.edu. I regret that I am unable to provide technical support, specific installation instructions for your system, or advice about configuring your parallel computer hardware. More fundamental than the language or library, however, are parallel pro-

iv gramming concepts and patterns, such as work sharing parallel loops, parallel reduction, and communication and coordination. Whether you use OpenMP s compiler directives, MPI s message passing subroutines, or PJ2 s Java classes, the concepts and patterns are the same. Only the syntax differs. Once you ve learned parallel programming in Java with PJ2, you ll be able to apply the same concepts and patterns in C, Fortran, or other languages with OpenMP, MPI, or other libraries. To study parallel programming with this book, you ll need the following prerequisite knowledge: Java programming; C programming (for GPU programs); computer organization concepts (CPU, memory, cache, and so on); operating system concepts (threads, thread synchronization). My pedagogical style is to teach by example. Accordingly, this book consists of a series of complete parallel program examples that illustrate various aspects of parallel programming. The example programs source code is listed on the right-hand pages, and explanatory narrative is on the left-hand pages. The example source code is also included in the PJ2 download. To write programs well, you must first learn to read programs; so please avoid the temptation to gloss over the source code listings, and carefully study both the source code and the explanations. Also study the PJ2 Javadoc documentation for the various classes used in the example programs. The Javadoc includes comprehensive descriptions of each class and method. Space does not permit describing all the classes in detail in this book; read the Javadoc for further information. The book consists of these parts: Part I covers introductory concepts. Part II covers parallel programming for multicore computers. Part III covers parallel programming for compute clusters. Part IV covers parallel programming on GPUs. Part V covers big data parallel programming using map-reduce. Instructors: There are no PowerPoint slides to go with this book. Slide shows have their place, but the classroom is not it. Nothing is guaranteed to put students to sleep faster than a PowerPoint lecture. An archive containing all the book s illustrations in PNG format is available from the book s web site; please feel free to use these to develop your own instructional materials. Alan Kaminsky August 2015

Table of Contents Preface... iii Part I. Preliminaries Chapter 1. The Parallel Landscape... 1 1 Part II. Tightly Coupled Multicore Chapter 2. Parallel Loops... 2 1 Chapter 3. Parallel Loop Schedules... 3 1 Chapter 4. Parallel Reduction... 4 1 Chapter 5. Reduction Variables... 5 1 Chapter 6. Load Balancing... 6 1 Chapter 7. Overlapping... 7 1 Chapter 8. Sequential Dependencies... 8 1 Chapter 9. Strong Scaling... 9 1 Chapter 10. Weak Scaling... 10 1 Chapter 11. Exhaustive Search... 11 1 Chapter 12. Heuristic Search... 12 1 Chapter 13. Parallel Work Queues... 13 1 Part III. Loosely Coupled Cluster Chapter 14. Massively Parallel... Chapter 15. Hybrid Parallel... Chapter 16. Tuple Space... Chapter 17. Cluster Parallel Loops... Chapter 18. Cluster Parallel Reduction... Chapter 19. Cluster Load Balancing... Chapter 20. File Output on a Cluster... Chapter 21. Interacting Tasks... Chapter 22. Cluster Heuristic Search... Chapter 23. Cluster Work Queues... 14 1 15 1 16 1 17 1 18 1 19 1 20 1 21 1 22 1 23 1

vi Part IV. GPU Acceleration Chapter 24. GPU Massively Parallel... Chapter 25. GPU Parallel Reduction... Chapter 26. Multi-GPU Programming... Chapter 27. GPU Sequential Dependencies... Chapter 28. Objects on the GPU... 24 1 25 1 26 1 27 1 28 1 Part V. Big Data Chapter 29. Basic Map-Reduce... 29 1 Chapter 30. Cluster Map-Reduce... 30 1 Chapter 31. Big Data Analysis... 31 1