BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky



Similar documents
BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky

HPC Wales Skills Academy Course Catalogue 2015

Introduction to GPU Programming Languages

GPUs for Scientific Computing

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Part I Courses Syllabus

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp

Petascale Visualization: Approaches and Initial Results

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

The Top Six Advantages of CUDA-Ready Clusters. Ian Lumb Bright Evangelist

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

CUDA programming on NVIDIA GPUs

BLM 413E - Parallel Programming Lecture 3

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

Next Generation GPU Architecture Code-named Fermi

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

Climate-Weather Modeling Studies Using a Prototype Global Cloud-System Resolving Model

The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.

HIGH PERFORMANCE BIG DATA ANALYTICS

Multicore Parallel Computing with OpenMP

High Productivity Computing With Windows

Evaluation of CUDA Fortran for the CFD code Strukti

GPU Hardware and Programming Models. Jeremy Appleyard, September 2015

Turbomachinery CFD on many-core platforms experiences and strategies

PARALLEL PROGRAMMING

Parallel Algorithm Engineering

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Overview. NVIDIA Quadro K5200 8GB Graphics J3G90AA

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

Introduction to Cloud Computing

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Trends in High-Performance Computing for Power Grid Applications

NVIDIA CUDA GETTING STARTED GUIDE FOR MAC OS X

Parallel Computing. Benson Muite. benson.

QuickSpecs. NVIDIA Quadro K5200 8GB Graphics INTRODUCTION. NVIDIA Quadro K5200 8GB Graphics. Technical Specifications

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

5x in 5 hours Porting SEISMIC_CPML using the PGI Accelerator Model

INTEL PARALLEL STUDIO XE EVALUATION GUIDE

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

Modern Platform for Parallel Algorithms Testing: Java on Intel Xeon Phi

Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers

Parallel Computing for Data Science

Initial Hardware Estimation Guidelines. AgilePoint BPMS v5.0 SP1

MPI and Hybrid Programming Models. William Gropp

PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0

COMP/CS 605: Intro to Parallel Computing Lecture 01: Parallel Computing Overview (Part 1)

Large-Scale Reservoir Simulation and Big Data Visualization

CREATING ON-LINE MATERIALS FOR COMPUTER ENGINEERING COURSES

Case Study on Productivity and Performance of GPGPUs

An Introduction to Parallel Computing/ Programming

10- High Performance Compu5ng

Incorporating Multicore Programming in Bachelor of Science in Computer Engineering Program

Technical Computing Suite Job Management Software

Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing

Course Development of Programming for General-Purpose Multicore Processors

Proxmox VE Subscriptions Agreement

Introduction to Linux and Cluster Basics for the CCR General Computing Cluster

Neptune. A Domain Specific Language for Deploying HPC Software on Cloud Platforms. Chris Bunch Navraj Chohan Chandra Krintz Khawaja Shams

Performance Analysis for GPU Accelerated Applications

Data Center and Cloud Computing Market Landscape and Challenges

NVIDIA GeForce GTX 580 GPU Datasheet

Introduction to GPU Computing

Advanced Operating Systems (M) Dr Colin Perkins School of Computing Science University of Glasgow

Medical Image Processing on the GPU. Past, Present and Future. Anders Eklund, PhD Virginia Tech Carilion Research Institute

Masters in Human Computer Interaction

HPC Software Requirements to Support an HPC Cluster Supercomputer

Introduction to Hybrid Programming

NVIDIA CUDA GETTING STARTED GUIDE FOR MICROSOFT WINDOWS

Overview of HPC Resources at Vanderbilt

Fundamentals of Programming and Software Development Lesson Objectives

Sourcery Overview & Virtual Machine Installation

IBM DB2 Near-Line Storage Solution for SAP NetWeaver BW

Program Optimization for Multi-core Architectures

CSCI E 98: Managed Environments for the Execution of Programs

Testing for Security

ConcourseSuite 7.0. Installation, Setup, Maintenance, and Upgrade

TIBCO Runtime Agent Authentication API User s Guide. Software Release November 2012

What s New in MATLAB and Simulink

Introduction to parallel computing and UPPMAX

Software Development around a Millisecond

Java and Real Time Storage Applications

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Going Linux on Massive Multicore

Drivers to support the growing business data demand for Performance Management solutions and BI Analytics

Accelerating CFD using OpenFOAM with GPUs

Transcription:

Solving the World s Toughest Computational Problems with Parallel Computing

Solving the World s Toughest Computational Problems with Parallel Computing Department of Computer Science B. Thomas Golisano College of Computing and Information Sciences Rochester Institute of Technology

ii Copyright 2015 by. All rights reserved. ISBN 000-0-0000-0000-0 The book : Solving the World s Toughest Computational Problems with Parallel Computing is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA. To reduce costs, the hardcopy version is printed in black and white. For a fullcolor e-version, see http://www.cs.rit.edu/~ark/bcbd/. The program source files listed in this book are part of the Parallel Java 2 Library ( The Library ). The Library is copyright 2013 2015 by. All rights reserved. The Library is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. The Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with The Library. If not, see http://www.gnu.org/licenses/. You can get the Parallel Java 2 Library at http://www.cs.rit.edu/~ark/pj2.shtml. Front cover image: The IBM Blue Gene/P supercomputer installation at the Argonne Leadership Angela Yang Computing Facility located in the Argonne National Laboratory, in Lemont, Illinois, USA. Courtesy of Argonne National Laboratory. http://commons.wikimedia.org/wiki/file:ibm_blue_gene_p_supercomputer.jpg Professor Department of Computer Science B. Thomas Golisano College of Computing and Information Sciences Rochester Institute of Technology ark@cs.rit.edu http://www.cs.rit.edu/~ark/ August 2015 edition

Preface W ith the book, my goal is to teach you how to write parallel programs that take full advantage of the vast processing power of modern multicore computers, compute clusters, and graphics processing unit (GPU) accelerators. The book is free, Creative Commons licensed, and is available from my web site (http://www.cs.rit.edu/~ark/bcbd/). I m not going to teach you parallel programming using popular parallel libraries like MPI, OpenMP, and OpenCL. (If you re interested in learning those, plenty of other books are available.) Why? Two reasons: I prefer to program in Java. The aforementioned libraries do not, and in my belief never will, support Java. In my experience, teaching and learning parallel programming with the aforementioned libraries is more difficult than with Java. Instead, I m going to use my Parallel Java 2 Library (PJ2) in this book. PJ2 is free, GNU GPL licensed software available from my web site (http:// www.cs.rit.edu/~ark/pj2.shtml). You can download the complete source files, compiled class files, and Javadoc documentation. PJ2 requires Java Development Kit (JDK) 1.7 or higher. Installation instructions are included in the Javadoc. PJ2 is suitable both for teaching and learning parallel programming and for real-world parallel program development. I use PJ2 and its predecessor, the Parallel Java Library (PJ), in my cryptography research. Others have used PJ to do page rank calculations, ocean ecosystem modeling, salmon population modeling and analysis, medication scheduling for patients in long term care facilities, three-dimensional complex-valued fast Fourier transforms for electronic structure analysis and X-ray crystallography, and Monte Carlo simulation of electricity and gas markets. PJ was also incorporated into the IQM open source Java image processing application. I am happy to answer general questions about PJ2, receive bug reports, and entertain requests for additional features. Please contact me by email at ark@cs.rit.edu. I regret that I am unable to provide technical support, specific installation instructions for your system, or advice about configuring your

iv parallel computer hardware. More fundamental than the language or library, however, are parallel programming concepts and patterns, such as work sharing parallel loops, parallel reduction, and communication and coordination. Whether you use OpenMP s compiler directives, MPI s message passing subroutines, or PJ2 s Java classes, the concepts and patterns are the same. Only the syntax differs. Once you ve learned parallel programming in Java with PJ2, you ll be able to apply the same concepts and patterns in C, Fortran, or other languages with OpenMP, MPI, or other libraries. To study parallel programming with this book, you ll need the following prerequisite knowledge: Java programming; C programming (for GPU programs); computer organization concepts (CPU, memory, cache, and so on); operating system concepts (threads, thread synchronization). My pedagogical style is to teach by example. Accordingly, this book consists of a series of complete parallel program examples that illustrate various aspects of parallel programming. The example programs source code is listed on the right-hand pages, and explanatory narrative is on the left-hand pages. The example source code is also included in the PJ2 download. To write programs well, you must first learn to read programs; so please avoid the temptation to gloss over the source code listings, and carefully study both the source code and the explanations. Also study the PJ2 Javadoc documentation for the various classes used in the example programs. The Javadoc includes comprehensive descriptions of each class and method. Space does not permit describing all the classes in detail in this book; read the Javadoc for further information. The book consists of these parts: Part I covers introductory concepts. Part II covers parallel programming for multicore computers. Part III covers parallel programming for compute clusters. Part IV covers parallel programming on GPUs. Part V covers big data parallel programming using map-reduce. Instructors: There are no PowerPoint slides to go with this book. Slide shows have their place, but the classroom is not it. Nothing is guaranteed to put students to sleep faster than a PowerPoint lecture. An archive containing all the book s illustrations in PNG format is available from the book s web site; please feel free to use these to develop your own instructional materials. August 2015