Parallel Computing for Data Science

Size: px
Start display at page:

Download "Parallel Computing for Data Science"

Transcription

1 Parallel Computing for Data Science With Examples in R, C++ and CUDA Norman Matloff University of California, Davis USA (g) CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is an imprint of the Taylor St Francis Croup, an informa business A CHAPMAN & HALL BOOK

2 Contents Preface xix Bio xxiii 1 Introduction to Parallel Processing in R Recurring Theme: The Principle of Pretty Good Parallelism Fast Enough "R+X" A Note on Machines Recurring Theme: Hedging One's Bets Extended Example: Mutual Web Outlinks Serial Code Choice of Parallel Tool Meaning of "snow" in This Book Introduction to snow Mutual Outlinks Problem, Solution Code Timings Analysis of the Code 11 vii

3 viii CONTENTS 2 "Why Is My Program So Slow?": Obstacles to Speed Obstacles to Speed Performance and Hardware Structures Memory Basics Caches Virtual Memory Monitoring Cache Misses and Page Faults Locality of Reference Network Basics Latency and Bandwidth Two Representative Hardware Platforms: Multicore Machines and Clusters Multicore Clusters The Principle of "Just Leave It There" Thread Scheduling How Many Processes/Threads? Example: Mutual Outlink Problem "Big O" Notation Data Serialization "Embarrassingly Parallel" Applications What People Mean by "Embarrassingly Parallel" Suitable Platforms for Non-Embarrassingly Parallel Applications 34 3 Principles of Parallel Loop Scheduling General Notions of Loop Scheduling Chunking in snow 38

4 CONTENTS ix Example: Mutual Outlinks Problem A Note on Code Complexity Example: All Possible Regressions Parallelization Strategies The Code Sample Run Code Analysis Our Task List Chunking Task Scheduling The Actual Dispatching of Work Wrapping Up Timing Experiments The partools Package Example: All Possible Regressions, Improved Version Code Code Analysis Timings Introducing Another Tool: multicore Source of the Performance Advantage Example: All Possible Regressions, Using multicore Issues with Chunk Size Example: Parallel Distance Computation The Code Timings The foreach Package Example: Mutual Outlinks Problem 70

5 X CONTENTS A Caution When Using foreach Stride Another Scheduling Approach: Random Task Permutation The Math The Random Method vs. Others, in Practice Debugging snow and multicore Code Debugging in snow Debugging in multicore 78 4 The Shared-Memory Paradigm: A Gentle Introduction via R So, What Is Actually Shared? Global Variables Local Variables: Stack Structures Non-Shared Memory Systems Clarity of Shared-Memory Code High-Level Introduction to Shared-Memory Programming: Rdsm Package Use of Shared Memory Example: Matrix Multiplication The Code Analysis The Code A Closer Look at the Shared Nature of Our Data Timing Comparison Leveraging R Shared Memory Can Bring A Performance Advantage Locks and Barriers 94

6 CONTENTS xi Race Conditions and Critical Sections Locks Barriers Example: Maximal Burst in a Time Series The Code Example: Transforming an Adjacency Matrix The Code Overallocation of Memory Timing Experiment Example: k-means Clustering The Code Timing Experiment The Shared-Memory Paradigm: C Level OpenMP Example: Finding the Maximal Burst in a Time Series The Code Compiling and Running Analysis A Cautionary Note About Thread Scheduling Setting the Number of Threads Timings OpenMP Loop Scheduling Options OpenMP Scheduling Options Scheduling through Work Stealing Example: Transforming an Adjacency Matrix The Code 127

7 5.4.2 Analysis of the Code Example: Adjacency Matrix, R-Callable Code The Code, for.c() Compiling and Running Analysis The Code, for Repp Compiling and Running Code Analysis Advanced Repp Speedup in C Run Time vs. Development Time Further Cache/Virtual Memory Issues Reduction Operations in OpenMP Example: Mutual In-Links The Code Sample Run Analysis Cache Issues Rows vs. Columns Processor Affinity Debugging Threads Commands in GDB Using GDB on C/C++ Code Called from R Intel Thread Building Blocks (TBB) Lockfree Synchronization 155

8 CONTENTS xni 6 The Shared-Memory Paradigm: GPUs Overview Another Note on Code Complexity Goal of This Chapter Introduction to NVIDIA GPUs and CUDA Example: Calculate Row Sums NVIDIA GPU Hardware Structure Cores Threads The Problem of Thread Divergence "OS in Hardware" Grid Configuration Choices Latency Hiding in GPUs Shared Memory More Hardware Details Resource Limitations Example: Mutual Inlinks Problem The Code Timing Experiments Synchronization on GPUs Data in Global Memory Is Persistent R and GPUs Example: Parallel Distance Computation The Intel Xeon Phi Chip Thrust and Rth Hedging One's Bets 179

9 xiv CONTENTS 7.2 Thrust Overview Rth Skipping the C Example: Finding Quantiles The Code Compilation and Timings Code Analysis Introduction to Rth The Message Passing Paradigm Message Passing Overview The Cluster Model Performance Issues Rmpi Installation and Execution Example: Pipelined Method for Finding Primes Algorithm The Code Timing Example Latency, Bandwdith and Parallelism Possible Improvements Analysis of the Code Memory Allocation Issues Message-Passing Performance Subtleties Blocking vs. Nonblocking I/O The Dreaded Deadlock Problem 205

10 CONTENTS xv 9 MapReduce Computation Apache Hadoop Hadoop Streaming Example: Word Count Running the Code Analysis of the Code Role of Disk Files Other MapReduce Systems R Interfaces to MapReduce Systems An Alternative: "Snowdoop" Example: Snowdoop Word Count Example: Snowdoop k-means Clustering Parallel Sorting and Merging The Elusive Goal of Optimality Sorting Algorithms Compare-and-Exchange Operations Some "Representative" Sorting Algorithms Example: Bucket Sort in R Example: Quicksort in OpenMP Sorting in Rth Some Timing Comparisons Sorting on Distributed Data Hyperquicksort Parallel Prefix Scan General Formulation Applications General Strategies 235

11 xvi CONTENTS A Log-Based Method Another Way Implementations of Parallel Prefix Scan Parallel cumsum() with OpenMP Stack Size Limitations Let's Try It Out Example: Moving Average Rth Code Algorithm Performance Use of Lambda Functions Parallel Matrix Operations Tiled Matrices Example: Snowdoop Approach Parallel Matrix Multiplication Multiplication on Message-Passing Systems Distributed Storage Fox's Algorithm Overhead Issues Multiplication on Multicore Machines Overhead Issues Matrix Multiplication on GPUs Overhead Issues BLAS Libraries Overview Example: Performance of OpenBLAS 261

12 CONTENTS xvii 12.6 Example: Graph Connectedness Analysis The "Log Trick" Parallel Computation The matpow Package Features Solving Systems of Linear Equations The Classical Approach: Gaussian Elimination and the LU Decomposition The Jacobi Algorithm Parallelization Example: R/gputools Implementation of Jacobi QR Decomposition Some Timing Results Sparse Matrices Inherently Statistical Approaches: Subset Methods Chunk Averaging Asymptotic Equivalence O(-) Analysis Code Timing Experiments Example: Quantile Regression Example: Logistic Model Example: Estimating Hazard Functions Non-i.i.d. Settings Bag of Little Bootstraps Subsetting Variables 283

13 XVlll CONTENTS A Review of Matrix Algebra 285 A.l Terminology and Notation 285 A.1.1 Matrix Addition and Multiplication 286 A.2 Matrix Transpose 287 A.3 Linear Independence 288 A.4 Determinants 288 A.5 Matrix Inverse 288 A.6 Eigenvalues and Eigenvectors 289 A.7 Matrix Algebra in R 290 B R Quick Start 293 B.l Correspondences 293 B.2 Starting R 294 B.3 First Sample Programming Session 294 B.4 Second Sample Programming Session 298 B.5 Third Sample Programming Session 300 B.6 The R List Type 301 B.6.1 The Basics 301 B.6.2 The Reduce() Function 302 B.6.3 S3 Classes 302 B.6.4 Handy Utilities 304 B.7 Debugging in R 305 C Introduction to C for R Programmers 307 C.0.1 Sample Program 307 CO.2 Analysis 308 C.l C Index 311

Parallel Computing for Data Science

Parallel Computing for Data Science Parallel Computing for Data Science with Examples in R and Beyond Norman Matloff University of California, Davis This is a draft of the first half of a book to be published in 2014 under the Chapman &

More information

PARALLEL PROGRAMMING

PARALLEL PROGRAMMING PARALLEL PROGRAMMING TECHNIQUES AND APPLICATIONS USING NETWORKED WORKSTATIONS AND PARALLEL COMPUTERS 2nd Edition BARRY WILKINSON University of North Carolina at Charlotte Western Carolina University MICHAEL

More information

Customer and Business Analytic

Customer and Business Analytic Customer and Business Analytic Applied Data Mining for Business Decision Making Using R Daniel S. Putler Robert E. Krider CRC Press Taylor &. Francis Group Boca Raton London New York CRC Press is an imprint

More information

Cloud Computing. and Scheduling. Data-Intensive Computing. Frederic Magoules, Jie Pan, and Fei Teng SILKQH. CRC Press. Taylor & Francis Group

Cloud Computing. and Scheduling. Data-Intensive Computing. Frederic Magoules, Jie Pan, and Fei Teng SILKQH. CRC Press. Taylor & Francis Group Cloud Computing Data-Intensive Computing and Scheduling Frederic Magoules, Jie Pan, and Fei Teng SILKQH CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is an imprint of the Taylor

More information

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

GPU File System Encryption Kartik Kulkarni and Eugene Linkov GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through

More information

Spring 2011 Prof. Hyesoon Kim

Spring 2011 Prof. Hyesoon Kim Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem

More information

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large

More information

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview

More information

Introduction to GPU Programming Languages

Introduction to GPU Programming Languages CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure

More information

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild. Parallel Computing: Strategies and Implications Dori Exterman CTO IncrediBuild. In this session we will discuss Multi-threaded vs. Multi-Process Choosing between Multi-Core or Multi- Threaded development

More information

Big Data and Parallel Work with R

Big Data and Parallel Work with R Big Data and Parallel Work with R What We'll Cover Data Limits in R Optional Data packages Optional Function packages Going parallel Deciding what to do Data Limits in R Big Data? What is big data? More

More information

HIGH PERFORMANCE BIG DATA ANALYTICS

HIGH PERFORMANCE BIG DATA ANALYTICS HIGH PERFORMANCE BIG DATA ANALYTICS Kunle Olukotun Electrical Engineering and Computer Science Stanford University June 2, 2014 Explosion of Data Sources Sensors DoD is swimming in sensors and drowning

More information

Part I Courses Syllabus

Part I Courses Syllabus Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment

More information

SOFTWARE TESTING. A Craftsmcm's Approach THIRD EDITION. Paul C. Jorgensen. Auerbach Publications. Taylor &. Francis Croup. Boca Raton New York

SOFTWARE TESTING. A Craftsmcm's Approach THIRD EDITION. Paul C. Jorgensen. Auerbach Publications. Taylor &. Francis Croup. Boca Raton New York SOFTWARE TESTING A Craftsmcm's Approach THIRD EDITION Paul C. Jorgensen A Auerbach Publications Taylor &. Francis Croup Boca Raton New York Auerbach Publications is an imprint of the Taylor & Francis Group,

More information

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor

More information

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS

HIGH PERFORMANCE CONSULTING COURSE OFFERINGS Performance 1(6) HIGH PERFORMANCE CONSULTING COURSE OFFERINGS LEARN TO TAKE ADVANTAGE OF POWERFUL GPU BASED ACCELERATOR TECHNOLOGY TODAY 2006 2013 Nvidia GPUs Intel CPUs CONTENTS Acronyms and Terminology...

More information

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France

More information

Programming on Parallel Machines

Programming on Parallel Machines Programming on Parallel Machines Norm Matloff University of California, Davis GPU, Multicore, Clusters and More See Creative Commons license at http://heather.cs.ucdavis.edu/ matloff/probstatbook.html

More information

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage

Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework

More information

Architectures for Big Data Analytics A database perspective

Architectures for Big Data Analytics A database perspective Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum

More information

An Introduction to Parallel Computing/ Programming

An Introduction to Parallel Computing/ Programming An Introduction to Parallel Computing/ Programming Vicky Papadopoulou Lesta Astrophysics and High Performance Computing Research Group (http://ahpc.euc.ac.cy) Dep. of Computer Science and Engineering European

More information

Parallel Algorithm Engineering

Parallel Algorithm Engineering Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis

More information

HPC Wales Skills Academy Course Catalogue 2015

HPC Wales Skills Academy Course Catalogue 2015 HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses

More information

Scheduling Task Parallelism" on Multi-Socket Multicore Systems"

Scheduling Task Parallelism on Multi-Socket Multicore Systems Scheduling Task Parallelism" on Multi-Socket Multicore Systems" Stephen Olivier, UNC Chapel Hill Allan Porterfield, RENCI Kyle Wheeler, Sandia National Labs Jan Prins, UNC Chapel Hill Outline" Introduction

More information

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

Next Generation GPU Architecture Code-named Fermi

Next Generation GPU Architecture Code-named Fermi Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time

More information

CHAPMAN & HALL/CRC INNOVATIONS IN SOFTWARE ENGINEERING AND SOFTWARE DEVELOPMENT. Software Test Attacks to Break Mobile and Embedded Devices

CHAPMAN & HALL/CRC INNOVATIONS IN SOFTWARE ENGINEERING AND SOFTWARE DEVELOPMENT. Software Test Attacks to Break Mobile and Embedded Devices CHAPMAN & HALL/CRC INNOVATIONS IN SOFTWARE ENGINEERING AND SOFTWARE DEVELOPMENT Software Test Attacks to Break Mobile and Embedded Devices Jon Duncan Hagar (g) CRC Press Taylor & Francis Group Boca Raton

More information

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.

More information

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics 22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC

More information

THE COMPLETE PROJECT MANAGEMENT METHODOLOGY AND TOOLKIT

THE COMPLETE PROJECT MANAGEMENT METHODOLOGY AND TOOLKIT THE COMPLETE PROJECT MANAGEMENT METHODOLOGY AND TOOLKIT GERARD M. HILL CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Croup, an informa business

More information

Introduction to GPU hardware and to CUDA

Introduction to GPU hardware and to CUDA Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware

More information

MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp

MPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp MPI and Hybrid Programming Models William Gropp www.cs.illinois.edu/~wgropp 2 What is a Hybrid Model? Combination of several parallel programming models in the same program May be mixed in the same source

More information

Course Development of Programming for General-Purpose Multicore Processors

Course Development of Programming for General-Purpose Multicore Processors Course Development of Programming for General-Purpose Multicore Processors Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University Richmond, VA 23284 wzhang4@vcu.edu

More information

Exploratory Data Analysis with MATLAB

Exploratory Data Analysis with MATLAB Computer Science and Data Analysis Series Exploratory Data Analysis with MATLAB Second Edition Wendy L Martinez Angel R. Martinez Jeffrey L. Solka ( r ec) CRC Press VV J Taylor & Francis Group Boca Raton

More information

Symmetric Multiprocessing

Symmetric Multiprocessing Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called

More information

Ctfo MANAGEMENT SECURITY PATCH. Felicia M. Nicastro. Second Edition. CRC Press. VC#*' J Taylor & Francis Group / Boca Raton London New York

Ctfo MANAGEMENT SECURITY PATCH. Felicia M. Nicastro. Second Edition. CRC Press. VC#*' J Taylor & Francis Group / Boca Raton London New York SECURITY PATCH MANAGEMENT Second Edition Felicia M. Nicastro Ctfo CRC Press VC#*' J Taylor & Francis Group / Boca Raton London New York CRC Press Is an imprint of the Taylor & Francis Croup, an Informa

More information

SECOND EDITION THE SECURITY RISK ASSESSMENT HANDBOOK. A Complete Guide for Performing Security Risk Assessments DOUGLAS J. LANDOLL

SECOND EDITION THE SECURITY RISK ASSESSMENT HANDBOOK. A Complete Guide for Performing Security Risk Assessments DOUGLAS J. LANDOLL SECOND EDITION THE SECURITY RISK ASSESSMENT HANDBOOK A Complete Guide for Performing Security Risk Assessments DOUGLAS J. LANDOLL CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is

More information

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah Pro Apache Hadoop Second Edition Sameer Wadkar Madhu Siddalingaiah Contents J About the Authors About the Technical Reviewer Acknowledgments Introduction xix xxi xxiii xxv Chapter 1: Motivation for Big

More information

Chapter 2 Parallel Architecture, Software And Performance

Chapter 2 Parallel Architecture, Software And Performance Chapter 2 Parallel Architecture, Software And Performance UCSB CS140, T. Yang, 2014 Modified from texbook slides Roadmap Parallel hardware Parallel software Input and output Performance Parallel program

More information

Parallel Programming Survey

Parallel Programming Survey Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory

More information

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip. Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide

More information

SYSTAP / bigdata. Open Source High Performance Highly Available. 1 http://www.bigdata.com/blog. bigdata Presented to CSHALS 2/27/2014

SYSTAP / bigdata. Open Source High Performance Highly Available. 1 http://www.bigdata.com/blog. bigdata Presented to CSHALS 2/27/2014 SYSTAP / Open Source High Performance Highly Available 1 SYSTAP, LLC Small Business, Founded 2006 100% Employee Owned Customers OEMs and VARs Government TelecommunicaHons Health Care Network Storage Finance

More information

The SpiceC Parallel Programming System of Computer Systems

The SpiceC Parallel Programming System of Computer Systems UNIVERSITY OF CALIFORNIA RIVERSIDE The SpiceC Parallel Programming System A Dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science

More information

Using In-Memory Computing to Simplify Big Data Analytics

Using In-Memory Computing to Simplify Big Data Analytics SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed

More information

HPC with Multicore and GPUs

HPC with Multicore and GPUs HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware

More information

Program Grid and HPC5+ workshop

Program Grid and HPC5+ workshop Program Grid and HPC5+ workshop 24-30, Bahman 1391 Tuesday Wednesday 9.00-9.45 9.45-10.30 Break 11.00-11.45 11.45-12.30 Lunch 14.00-17.00 Workshop Rouhani Karimi MosalmanTabar Karimi G+MMT+K Opening IPM_Grid

More information

Data Algorithms. Mahmoud Parsian. Tokyo O'REILLY. Beijing. Boston Farnham Sebastopol

Data Algorithms. Mahmoud Parsian. Tokyo O'REILLY. Beijing. Boston Farnham Sebastopol Data Algorithms Mahmoud Parsian Beijing Boston Farnham Sebastopol Tokyo O'REILLY Table of Contents Foreword xix Preface xxi 1. Secondary Sort: Introduction 1 Solutions to the Secondary Sort Problem 3 Implementation

More information

Performance Characteristics of Large SMP Machines

Performance Characteristics of Large SMP Machines Performance Characteristics of Large SMP Machines Dirk Schmidl, Dieter an Mey, Matthias S. Müller schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) Agenda Investigated Hardware Kernel Benchmark

More information

SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri

SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri Our Contributions SWARM: SoftWare and Algorithms for Running on Multicore, a portable

More information

OpenCL for programming shared memory multicore CPUs

OpenCL for programming shared memory multicore CPUs Akhtar Ali, Usman Dastgeer and Christoph Kessler. OpenCL on shared memory multicore CPUs. Proc. MULTIPROG-212 Workshop at HiPEAC-212, Paris, Jan. 212. OpenCL for programming shared memory multicore CPUs

More information

Cloud Computing at Google. Architecture

Cloud Computing at Google. Architecture Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale

More information

MONTE-CARLO SIMULATION OF AMERICAN OPTIONS WITH GPUS. Julien Demouth, NVIDIA

MONTE-CARLO SIMULATION OF AMERICAN OPTIONS WITH GPUS. Julien Demouth, NVIDIA MONTE-CARLO SIMULATION OF AMERICAN OPTIONS WITH GPUS Julien Demouth, NVIDIA STAC-A2 BENCHMARK STAC-A2 Benchmark Developed by banks Macro and micro, performance and accuracy Pricing and Greeks for American

More information

Assessing the Performance of OpenMP Programs on the Intel Xeon Phi

Assessing the Performance of OpenMP Programs on the Intel Xeon Phi Assessing the Performance of OpenMP Programs on the Intel Xeon Phi Dirk Schmidl, Tim Cramer, Sandra Wienke, Christian Terboven, and Matthias S. Müller schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum

More information

Introduction to Cloud Computing

Introduction to Cloud Computing Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic

More information

Fast Analytics on Big Data with H20

Fast Analytics on Big Data with H20 Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,

More information

Analysis of MapReduce Algorithms

Analysis of MapReduce Algorithms Analysis of MapReduce Algorithms Harini Padmanaban Computer Science Department San Jose State University San Jose, CA 95192 408-924-1000 harini.gomadam@gmail.com ABSTRACT MapReduce is a programming model

More information

How To Understand Multivariate Models

How To Understand Multivariate Models Neil H. Timm Applied Multivariate Analysis With 42 Figures Springer Contents Preface Acknowledgments List of Tables List of Figures vii ix xix xxiii 1 Introduction 1 1.1 Overview 1 1.2 Multivariate Models

More information

Big Graph Processing: Some Background

Big Graph Processing: Some Background Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs

More information

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA

OpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization

More information

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Shin Morishima 1 and Hiroki Matsutani 1,2,3 1Keio University, 3 14 1 Hiyoshi, Kohoku ku, Yokohama, Japan 2National Institute

More information

High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach

High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach Beniamino Di Martino, Antonio Esposito and Andrea Barbato Department of Industrial and Information Engineering Second University of Naples

More information

Analysis and Optimization of a Hybrid Linear Equation Solver using Task-Based Parallel Programming Models

Analysis and Optimization of a Hybrid Linear Equation Solver using Task-Based Parallel Programming Models Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Analysis and Optimization of a Hybrid Linear Equation Solver using Task-Based Parallel Programming Models Claudia Rosas,

More information

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp

Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Welcome! Who am I? William (Bill) Gropp Professor of Computer Science One of the Creators of

More information

Grid Computing FUNDAMENTALS OF. Theory, Algorithms and Technologies. Frederic Magoules. Edited by. CRC Press

Grid Computing FUNDAMENTALS OF. Theory, Algorithms and Technologies. Frederic Magoules. Edited by. CRC Press FUNDAMENTALS OF Grid Computing Theory, Algorithms and Technologies Edited by Frederic Magoules CRC Press Taylor & Francis Group Boca Raton London NewYork CRC Press is an imprint of the Taylor 8t Francis

More information

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket

More information

CSE 6040 Computing for Data Analytics: Methods and Tools

CSE 6040 Computing for Data Analytics: Methods and Tools CSE 6040 Computing for Data Analytics: Methods and Tools Lecture 12 Computer Architecture Overview and Why it Matters DA KUANG, POLO CHAU GEORGIA TECH FALL 2014 Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS

More information

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:

More information

What s New in MATLAB and Simulink

What s New in MATLAB and Simulink What s New in MATLAB and Simulink Kevin Cohan Product Marketing, MATLAB Michael Carone Product Marketing, Simulink 2015 The MathWorks, Inc. 1 What was new for Simulink in R2012b? 2 What Was New for MATLAB

More information

Advances in Network Management

Advances in Network Management Advances in Network Management Jianguo Ding UC) CRC Press >5^ J Taylor & Francis Croup ^""""^ Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Group, an informa business AN AUERBACH

More information

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui

Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching

More information

COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers

COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Instructor: (jpineau@cs.mcgill.ca) TAs: Pierre-Luc Bacon (pbacon@cs.mcgill.ca) Ryan Lowe (ryan.lowe@mail.mcgill.ca)

More information

A Simulation-Based lntroduction Using Excel

A Simulation-Based lntroduction Using Excel Quantitative Finance A Simulation-Based lntroduction Using Excel Matt Davison University of Western Ontario London, Canada CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

MAPREDUCE Programming Model

MAPREDUCE Programming Model CS 2510 COMPUTER OPERATING SYSTEMS Cloud Computing MAPREDUCE Dr. Taieb Znati Computer Science Department University of Pittsburgh MAPREDUCE Programming Model Scaling Data Intensive Application MapReduce

More information

Presto/Blockus: Towards Scalable R Data Analysis

Presto/Blockus: Towards Scalable R Data Analysis /Blockus: Towards Scalable R Data Analysis Andrew A. Chien University of Chicago and Argonne ational Laboratory IRIA-UIUC-AL Joint Institute Potential Collaboration ovember 19, 2012 ovember 19, 2012 Andrew

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Accelerating Hadoop MapReduce Using an In-Memory Data Grid Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for

More information

Big data in R EPIC 2015

Big data in R EPIC 2015 Big data in R EPIC 2015 Big Data: the new 'The Future' In which Forbes magazine finds common ground with Nancy Krieger (for the first time ever?), by arguing the need for theory-driven analysis This future

More information

Informatica Ultra Messaging SMX Shared-Memory Transport

Informatica Ultra Messaging SMX Shared-Memory Transport White Paper Informatica Ultra Messaging SMX Shared-Memory Transport Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance This document contains Confidential, Proprietary and Trade

More information

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

GPU System Architecture. Alan Gray EPCC The University of Edinburgh GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems

More information

Gildart Haase School of Computer Sciences and Engineering

Gildart Haase School of Computer Sciences and Engineering Gildart Haase School of Computer Sciences and Engineering Metropolitan Campus I. Course: CSCI 6638 Operating Systems Semester: Fall 2014 Contact Hours: 3 Credits: 3 Class Hours: W 10:00AM 12:30 PM DH1153

More information

Optimizing Parallel Reduction in CUDA. Mark Harris NVIDIA Developer Technology

Optimizing Parallel Reduction in CUDA. Mark Harris NVIDIA Developer Technology Optimizing Parallel Reduction in CUDA Mark Harris NVIDIA Developer Technology Parallel Reduction Common and important data parallel primitive Easy to implement in CUDA Harder to get it right Serves as

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

Multi-Threading Performance on Commodity Multi-Core Processors

Multi-Threading Performance on Commodity Multi-Core Processors Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction

More information

Development and Management

Development and Management Cloud Database Development and Management Lee Chao CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Croup, an Informa business AN AUERBACH BOOK

More information

Data Structures and Performance for Scientific Computing with Hadoop and Dumbo

Data Structures and Performance for Scientific Computing with Hadoop and Dumbo Data Structures and Performance for Scientific Computing with Hadoop and Dumbo Austin R. Benson Computer Sciences Division, UC-Berkeley ICME, Stanford University May 15, 2012 1 1 Matrix storage 2 Data

More information

Amazon EC2 Product Details Page 1 of 5

Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Functionality Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of

More information

GPU for Scientific Computing. -Ali Saleh

GPU for Scientific Computing. -Ali Saleh 1 GPU for Scientific Computing -Ali Saleh Contents Introduction What is GPU GPU for Scientific Computing K-Means Clustering K-nearest Neighbours When to use GPU and when not Commercial Programming GPU

More information

Seminarbeschreibung. Windows Server 2008 Developing High-performance Applications using Microsoft Windows HPC Server 2008.

Seminarbeschreibung. Windows Server 2008 Developing High-performance Applications using Microsoft Windows HPC Server 2008. Seminarbeschreibung Windows Server 2008 Developing High-performance Applications using Microsoft Windows HPC Server 2008 Einleitung: In dieser Schulung lernen die Entwickler - High-Performance Computing

More information

Networking. Systems Design and. Development. CRC Press. Taylor & Francis Croup. Boca Raton London New York. CRC Press is an imprint of the

Networking. Systems Design and. Development. CRC Press. Taylor & Francis Croup. Boca Raton London New York. CRC Press is an imprint of the Networking Systems Design and Development Lee Chao CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Croup, an Informa business AN AUERBACH BOOK

More information

High Performance Matrix Inversion with Several GPUs

High Performance Matrix Inversion with Several GPUs High Performance Matrix Inversion on a Multi-core Platform with Several GPUs Pablo Ezzatti 1, Enrique S. Quintana-Ortí 2 and Alfredo Remón 2 1 Centro de Cálculo-Instituto de Computación, Univ. de la República

More information

HPC Software Requirements to Support an HPC Cluster Supercomputer

HPC Software Requirements to Support an HPC Cluster Supercomputer HPC Software Requirements to Support an HPC Cluster Supercomputer Susan Kraus, Cray Cluster Solutions Software Product Manager Maria McLaughlin, Cray Cluster Solutions Product Marketing Cray Inc. WP-CCS-Software01-0417

More information

Software Performance and Scalability

Software Performance and Scalability Software Performance and Scalability A Quantitative Approach Henry H. Liu ^ IEEE )computer society WILEY A JOHN WILEY & SONS, INC., PUBLICATION Contents PREFACE ACKNOWLEDGMENTS xv xxi Introduction 1 Performance

More information

Map-Reduce for Machine Learning on Multicore

Map-Reduce for Machine Learning on Multicore Map-Reduce for Machine Learning on Multicore Chu, et al. Problem The world is going multicore New computers - dual core to 12+-core Shift to more concurrent programming paradigms and languages Erlang,

More information

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase

More information

Petascale Software Challenges. William Gropp www.cs.illinois.edu/~wgropp

Petascale Software Challenges. William Gropp www.cs.illinois.edu/~wgropp Petascale Software Challenges William Gropp www.cs.illinois.edu/~wgropp Petascale Software Challenges Why should you care? What are they? Which are different from non-petascale? What has changed since

More information

Introduction to Financial Models for Management and Planning

Introduction to Financial Models for Management and Planning CHAPMAN &HALL/CRC FINANCE SERIES Introduction to Financial Models for Management and Planning James R. Morris University of Colorado, Denver U. S. A. John P. Daley University of Colorado, Denver U. S.

More information

Lecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Lecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the

More information

Data-Flow Awareness in Parallel Data Processing

Data-Flow Awareness in Parallel Data Processing Data-Flow Awareness in Parallel Data Processing D. Bednárek, J. Dokulil *, J. Yaghob, F. Zavoral Charles University Prague, Czech Republic * University of Vienna, Austria 6 th International Symposium on

More information

Image Analytics on Big Data In Motion Implementation of Image Analytics CCL in Apache Kafka and Storm

Image Analytics on Big Data In Motion Implementation of Image Analytics CCL in Apache Kafka and Storm Image Analytics on Big Data In Motion Implementation of Image Analytics CCL in Apache Kafka and Storm Lokesh Babu Rao 1 C. Elayaraja 2 1PG Student, Dept. of ECE, Dhaanish Ahmed College of Engineering,

More information