Parallel Computing for Data Science
|
|
- Rudolf Norton
- 8 years ago
- Views:
Transcription
1 Parallel Computing for Data Science With Examples in R, C++ and CUDA Norman Matloff University of California, Davis USA (g) CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is an imprint of the Taylor St Francis Croup, an informa business A CHAPMAN & HALL BOOK
2 Contents Preface xix Bio xxiii 1 Introduction to Parallel Processing in R Recurring Theme: The Principle of Pretty Good Parallelism Fast Enough "R+X" A Note on Machines Recurring Theme: Hedging One's Bets Extended Example: Mutual Web Outlinks Serial Code Choice of Parallel Tool Meaning of "snow" in This Book Introduction to snow Mutual Outlinks Problem, Solution Code Timings Analysis of the Code 11 vii
3 viii CONTENTS 2 "Why Is My Program So Slow?": Obstacles to Speed Obstacles to Speed Performance and Hardware Structures Memory Basics Caches Virtual Memory Monitoring Cache Misses and Page Faults Locality of Reference Network Basics Latency and Bandwidth Two Representative Hardware Platforms: Multicore Machines and Clusters Multicore Clusters The Principle of "Just Leave It There" Thread Scheduling How Many Processes/Threads? Example: Mutual Outlink Problem "Big O" Notation Data Serialization "Embarrassingly Parallel" Applications What People Mean by "Embarrassingly Parallel" Suitable Platforms for Non-Embarrassingly Parallel Applications 34 3 Principles of Parallel Loop Scheduling General Notions of Loop Scheduling Chunking in snow 38
4 CONTENTS ix Example: Mutual Outlinks Problem A Note on Code Complexity Example: All Possible Regressions Parallelization Strategies The Code Sample Run Code Analysis Our Task List Chunking Task Scheduling The Actual Dispatching of Work Wrapping Up Timing Experiments The partools Package Example: All Possible Regressions, Improved Version Code Code Analysis Timings Introducing Another Tool: multicore Source of the Performance Advantage Example: All Possible Regressions, Using multicore Issues with Chunk Size Example: Parallel Distance Computation The Code Timings The foreach Package Example: Mutual Outlinks Problem 70
5 X CONTENTS A Caution When Using foreach Stride Another Scheduling Approach: Random Task Permutation The Math The Random Method vs. Others, in Practice Debugging snow and multicore Code Debugging in snow Debugging in multicore 78 4 The Shared-Memory Paradigm: A Gentle Introduction via R So, What Is Actually Shared? Global Variables Local Variables: Stack Structures Non-Shared Memory Systems Clarity of Shared-Memory Code High-Level Introduction to Shared-Memory Programming: Rdsm Package Use of Shared Memory Example: Matrix Multiplication The Code Analysis The Code A Closer Look at the Shared Nature of Our Data Timing Comparison Leveraging R Shared Memory Can Bring A Performance Advantage Locks and Barriers 94
6 CONTENTS xi Race Conditions and Critical Sections Locks Barriers Example: Maximal Burst in a Time Series The Code Example: Transforming an Adjacency Matrix The Code Overallocation of Memory Timing Experiment Example: k-means Clustering The Code Timing Experiment The Shared-Memory Paradigm: C Level OpenMP Example: Finding the Maximal Burst in a Time Series The Code Compiling and Running Analysis A Cautionary Note About Thread Scheduling Setting the Number of Threads Timings OpenMP Loop Scheduling Options OpenMP Scheduling Options Scheduling through Work Stealing Example: Transforming an Adjacency Matrix The Code 127
7 5.4.2 Analysis of the Code Example: Adjacency Matrix, R-Callable Code The Code, for.c() Compiling and Running Analysis The Code, for Repp Compiling and Running Code Analysis Advanced Repp Speedup in C Run Time vs. Development Time Further Cache/Virtual Memory Issues Reduction Operations in OpenMP Example: Mutual In-Links The Code Sample Run Analysis Cache Issues Rows vs. Columns Processor Affinity Debugging Threads Commands in GDB Using GDB on C/C++ Code Called from R Intel Thread Building Blocks (TBB) Lockfree Synchronization 155
8 CONTENTS xni 6 The Shared-Memory Paradigm: GPUs Overview Another Note on Code Complexity Goal of This Chapter Introduction to NVIDIA GPUs and CUDA Example: Calculate Row Sums NVIDIA GPU Hardware Structure Cores Threads The Problem of Thread Divergence "OS in Hardware" Grid Configuration Choices Latency Hiding in GPUs Shared Memory More Hardware Details Resource Limitations Example: Mutual Inlinks Problem The Code Timing Experiments Synchronization on GPUs Data in Global Memory Is Persistent R and GPUs Example: Parallel Distance Computation The Intel Xeon Phi Chip Thrust and Rth Hedging One's Bets 179
9 xiv CONTENTS 7.2 Thrust Overview Rth Skipping the C Example: Finding Quantiles The Code Compilation and Timings Code Analysis Introduction to Rth The Message Passing Paradigm Message Passing Overview The Cluster Model Performance Issues Rmpi Installation and Execution Example: Pipelined Method for Finding Primes Algorithm The Code Timing Example Latency, Bandwdith and Parallelism Possible Improvements Analysis of the Code Memory Allocation Issues Message-Passing Performance Subtleties Blocking vs. Nonblocking I/O The Dreaded Deadlock Problem 205
10 CONTENTS xv 9 MapReduce Computation Apache Hadoop Hadoop Streaming Example: Word Count Running the Code Analysis of the Code Role of Disk Files Other MapReduce Systems R Interfaces to MapReduce Systems An Alternative: "Snowdoop" Example: Snowdoop Word Count Example: Snowdoop k-means Clustering Parallel Sorting and Merging The Elusive Goal of Optimality Sorting Algorithms Compare-and-Exchange Operations Some "Representative" Sorting Algorithms Example: Bucket Sort in R Example: Quicksort in OpenMP Sorting in Rth Some Timing Comparisons Sorting on Distributed Data Hyperquicksort Parallel Prefix Scan General Formulation Applications General Strategies 235
11 xvi CONTENTS A Log-Based Method Another Way Implementations of Parallel Prefix Scan Parallel cumsum() with OpenMP Stack Size Limitations Let's Try It Out Example: Moving Average Rth Code Algorithm Performance Use of Lambda Functions Parallel Matrix Operations Tiled Matrices Example: Snowdoop Approach Parallel Matrix Multiplication Multiplication on Message-Passing Systems Distributed Storage Fox's Algorithm Overhead Issues Multiplication on Multicore Machines Overhead Issues Matrix Multiplication on GPUs Overhead Issues BLAS Libraries Overview Example: Performance of OpenBLAS 261
12 CONTENTS xvii 12.6 Example: Graph Connectedness Analysis The "Log Trick" Parallel Computation The matpow Package Features Solving Systems of Linear Equations The Classical Approach: Gaussian Elimination and the LU Decomposition The Jacobi Algorithm Parallelization Example: R/gputools Implementation of Jacobi QR Decomposition Some Timing Results Sparse Matrices Inherently Statistical Approaches: Subset Methods Chunk Averaging Asymptotic Equivalence O(-) Analysis Code Timing Experiments Example: Quantile Regression Example: Logistic Model Example: Estimating Hazard Functions Non-i.i.d. Settings Bag of Little Bootstraps Subsetting Variables 283
13 XVlll CONTENTS A Review of Matrix Algebra 285 A.l Terminology and Notation 285 A.1.1 Matrix Addition and Multiplication 286 A.2 Matrix Transpose 287 A.3 Linear Independence 288 A.4 Determinants 288 A.5 Matrix Inverse 288 A.6 Eigenvalues and Eigenvectors 289 A.7 Matrix Algebra in R 290 B R Quick Start 293 B.l Correspondences 293 B.2 Starting R 294 B.3 First Sample Programming Session 294 B.4 Second Sample Programming Session 298 B.5 Third Sample Programming Session 300 B.6 The R List Type 301 B.6.1 The Basics 301 B.6.2 The Reduce() Function 302 B.6.3 S3 Classes 302 B.6.4 Handy Utilities 304 B.7 Debugging in R 305 C Introduction to C for R Programmers 307 C.0.1 Sample Program 307 CO.2 Analysis 308 C.l C Index 311
Parallel Computing for Data Science
Parallel Computing for Data Science with Examples in R and Beyond Norman Matloff University of California, Davis This is a draft of the first half of a book to be published in 2014 under the Chapman &
More informationPARALLEL PROGRAMMING
PARALLEL PROGRAMMING TECHNIQUES AND APPLICATIONS USING NETWORKED WORKSTATIONS AND PARALLEL COMPUTERS 2nd Edition BARRY WILKINSON University of North Carolina at Charlotte Western Carolina University MICHAEL
More informationCustomer and Business Analytic
Customer and Business Analytic Applied Data Mining for Business Decision Making Using R Daniel S. Putler Robert E. Krider CRC Press Taylor &. Francis Group Boca Raton London New York CRC Press is an imprint
More informationCloud Computing. and Scheduling. Data-Intensive Computing. Frederic Magoules, Jie Pan, and Fei Teng SILKQH. CRC Press. Taylor & Francis Group
Cloud Computing Data-Intensive Computing and Scheduling Frederic Magoules, Jie Pan, and Fei Teng SILKQH CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is an imprint of the Taylor
More informationGPU File System Encryption Kartik Kulkarni and Eugene Linkov
GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through
More informationSpring 2011 Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim Today, we will study typical patterns of parallel programming This is just one of the ways. Materials are based on a book by Timothy. Decompose Into tasks Original Problem
More informationAPPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
More informationBenchmark Hadoop and Mars: MapReduce on cluster versus on GPU
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview
More informationIntroduction to GPU Programming Languages
CSC 391/691: GPU Programming Fall 2011 Introduction to GPU Programming Languages Copyright 2011 Samuel S. Cho http://www.umiacs.umd.edu/ research/gpu/facilities.html Maryland CPU/GPU Cluster Infrastructure
More informationParallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.
Parallel Computing: Strategies and Implications Dori Exterman CTO IncrediBuild. In this session we will discuss Multi-threaded vs. Multi-Process Choosing between Multi-Core or Multi- Threaded development
More informationBig Data and Parallel Work with R
Big Data and Parallel Work with R What We'll Cover Data Limits in R Optional Data packages Optional Function packages Going parallel Deciding what to do Data Limits in R Big Data? What is big data? More
More informationHIGH PERFORMANCE BIG DATA ANALYTICS
HIGH PERFORMANCE BIG DATA ANALYTICS Kunle Olukotun Electrical Engineering and Computer Science Stanford University June 2, 2014 Explosion of Data Sources Sensors DoD is swimming in sensors and drowning
More informationPart I Courses Syllabus
Part I Courses Syllabus This document provides detailed information about the basic courses of the MHPC first part activities. The list of courses is the following 1.1 Scientific Programming Environment
More informationSOFTWARE TESTING. A Craftsmcm's Approach THIRD EDITION. Paul C. Jorgensen. Auerbach Publications. Taylor &. Francis Croup. Boca Raton New York
SOFTWARE TESTING A Craftsmcm's Approach THIRD EDITION Paul C. Jorgensen A Auerbach Publications Taylor &. Francis Croup Boca Raton New York Auerbach Publications is an imprint of the Taylor & Francis Group,
More informationDetection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup
Network Anomaly Detection A Machine Learning Perspective Dhruba Kumar Bhattacharyya Jugal Kumar KaKta»C) CRC Press J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor
More informationHIGH PERFORMANCE CONSULTING COURSE OFFERINGS
Performance 1(6) HIGH PERFORMANCE CONSULTING COURSE OFFERINGS LEARN TO TAKE ADVANTAGE OF POWERFUL GPU BASED ACCELERATOR TECHNOLOGY TODAY 2006 2013 Nvidia GPUs Intel CPUs CONTENTS Acronyms and Terminology...
More informationPerformance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
More informationProgramming on Parallel Machines
Programming on Parallel Machines Norm Matloff University of California, Davis GPU, Multicore, Clusters and More See Creative Commons license at http://heather.cs.ucdavis.edu/ matloff/probstatbook.html
More informationParallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
More informationArchitectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
More informationAn Introduction to Parallel Computing/ Programming
An Introduction to Parallel Computing/ Programming Vicky Papadopoulou Lesta Astrophysics and High Performance Computing Research Group (http://ahpc.euc.ac.cy) Dep. of Computer Science and Engineering European
More informationParallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework Examples Software crisis
More informationHPC Wales Skills Academy Course Catalogue 2015
HPC Wales Skills Academy Course Catalogue 2015 Overview The HPC Wales Skills Academy provides a variety of courses and workshops aimed at building skills in High Performance Computing (HPC). Our courses
More informationScheduling Task Parallelism" on Multi-Socket Multicore Systems"
Scheduling Task Parallelism" on Multi-Socket Multicore Systems" Stephen Olivier, UNC Chapel Hill Allan Porterfield, RENCI Kyle Wheeler, Sandia National Labs Jan Prins, UNC Chapel Hill Outline" Introduction
More informationE6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices
E6895 Advanced Big Data Analytics Lecture 14: NVIDIA GPU Examples and GPU on ios devices Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist,
More informationRevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
More informationNext Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
More informationCHAPMAN & HALL/CRC INNOVATIONS IN SOFTWARE ENGINEERING AND SOFTWARE DEVELOPMENT. Software Test Attacks to Break Mobile and Embedded Devices
CHAPMAN & HALL/CRC INNOVATIONS IN SOFTWARE ENGINEERING AND SOFTWARE DEVELOPMENT Software Test Attacks to Break Mobile and Embedded Devices Jon Duncan Hagar (g) CRC Press Taylor & Francis Group Boca Raton
More informationIntroducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
More information22S:295 Seminar in Applied Statistics High Performance Computing in Statistics
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC
More informationTHE COMPLETE PROJECT MANAGEMENT METHODOLOGY AND TOOLKIT
THE COMPLETE PROJECT MANAGEMENT METHODOLOGY AND TOOLKIT GERARD M. HILL CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Croup, an informa business
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware
More informationMPI and Hybrid Programming Models. William Gropp www.cs.illinois.edu/~wgropp
MPI and Hybrid Programming Models William Gropp www.cs.illinois.edu/~wgropp 2 What is a Hybrid Model? Combination of several parallel programming models in the same program May be mixed in the same source
More informationCourse Development of Programming for General-Purpose Multicore Processors
Course Development of Programming for General-Purpose Multicore Processors Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University Richmond, VA 23284 wzhang4@vcu.edu
More informationExploratory Data Analysis with MATLAB
Computer Science and Data Analysis Series Exploratory Data Analysis with MATLAB Second Edition Wendy L Martinez Angel R. Martinez Jeffrey L. Solka ( r ec) CRC Press VV J Taylor & Francis Group Boca Raton
More informationSymmetric Multiprocessing
Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called
More informationCtfo MANAGEMENT SECURITY PATCH. Felicia M. Nicastro. Second Edition. CRC Press. VC#*' J Taylor & Francis Group / Boca Raton London New York
SECURITY PATCH MANAGEMENT Second Edition Felicia M. Nicastro Ctfo CRC Press VC#*' J Taylor & Francis Group / Boca Raton London New York CRC Press Is an imprint of the Taylor & Francis Croup, an Informa
More informationSECOND EDITION THE SECURITY RISK ASSESSMENT HANDBOOK. A Complete Guide for Performing Security Risk Assessments DOUGLAS J. LANDOLL
SECOND EDITION THE SECURITY RISK ASSESSMENT HANDBOOK A Complete Guide for Performing Security Risk Assessments DOUGLAS J. LANDOLL CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is
More informationPro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah
Pro Apache Hadoop Second Edition Sameer Wadkar Madhu Siddalingaiah Contents J About the Authors About the Technical Reviewer Acknowledgments Introduction xix xxi xxiii xxv Chapter 1: Motivation for Big
More informationChapter 2 Parallel Architecture, Software And Performance
Chapter 2 Parallel Architecture, Software And Performance UCSB CS140, T. Yang, 2014 Modified from texbook slides Roadmap Parallel hardware Parallel software Input and output Performance Parallel program
More informationParallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
More informationLecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.
Lecture 11: Multi-Core and GPU Multi-core computers Multithreading GPUs General Purpose GPUs Zebo Peng, IDA, LiTH 1 Multi-Core System Integration of multiple processor cores on a single chip. To provide
More informationSYSTAP / bigdata. Open Source High Performance Highly Available. 1 http://www.bigdata.com/blog. bigdata Presented to CSHALS 2/27/2014
SYSTAP / Open Source High Performance Highly Available 1 SYSTAP, LLC Small Business, Founded 2006 100% Employee Owned Customers OEMs and VARs Government TelecommunicaHons Health Care Network Storage Finance
More informationThe SpiceC Parallel Programming System of Computer Systems
UNIVERSITY OF CALIFORNIA RIVERSIDE The SpiceC Parallel Programming System A Dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science
More informationUsing In-Memory Computing to Simplify Big Data Analytics
SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed
More informationHPC with Multicore and GPUs
HPC with Multicore and GPUs Stan Tomov Electrical Engineering and Computer Science Department University of Tennessee, Knoxville CS 594 Lecture Notes March 4, 2015 1/18 Outline! Introduction - Hardware
More informationProgram Grid and HPC5+ workshop
Program Grid and HPC5+ workshop 24-30, Bahman 1391 Tuesday Wednesday 9.00-9.45 9.45-10.30 Break 11.00-11.45 11.45-12.30 Lunch 14.00-17.00 Workshop Rouhani Karimi MosalmanTabar Karimi G+MMT+K Opening IPM_Grid
More informationData Algorithms. Mahmoud Parsian. Tokyo O'REILLY. Beijing. Boston Farnham Sebastopol
Data Algorithms Mahmoud Parsian Beijing Boston Farnham Sebastopol Tokyo O'REILLY Table of Contents Foreword xix Preface xxi 1. Secondary Sort: Introduction 1 Solutions to the Secondary Sort Problem 3 Implementation
More informationPerformance Characteristics of Large SMP Machines
Performance Characteristics of Large SMP Machines Dirk Schmidl, Dieter an Mey, Matthias S. Müller schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum (RZ) Agenda Investigated Hardware Kernel Benchmark
More informationSWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri
SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri Our Contributions SWARM: SoftWare and Algorithms for Running on Multicore, a portable
More informationOpenCL for programming shared memory multicore CPUs
Akhtar Ali, Usman Dastgeer and Christoph Kessler. OpenCL on shared memory multicore CPUs. Proc. MULTIPROG-212 Workshop at HiPEAC-212, Paris, Jan. 212. OpenCL for programming shared memory multicore CPUs
More informationCloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
More informationMONTE-CARLO SIMULATION OF AMERICAN OPTIONS WITH GPUS. Julien Demouth, NVIDIA
MONTE-CARLO SIMULATION OF AMERICAN OPTIONS WITH GPUS Julien Demouth, NVIDIA STAC-A2 BENCHMARK STAC-A2 Benchmark Developed by banks Macro and micro, performance and accuracy Pricing and Greeks for American
More informationAssessing the Performance of OpenMP Programs on the Intel Xeon Phi
Assessing the Performance of OpenMP Programs on the Intel Xeon Phi Dirk Schmidl, Tim Cramer, Sandra Wienke, Christian Terboven, and Matthias S. Müller schmidl@rz.rwth-aachen.de Rechen- und Kommunikationszentrum
More informationIntroduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
More informationFast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
More informationAnalysis of MapReduce Algorithms
Analysis of MapReduce Algorithms Harini Padmanaban Computer Science Department San Jose State University San Jose, CA 95192 408-924-1000 harini.gomadam@gmail.com ABSTRACT MapReduce is a programming model
More informationHow To Understand Multivariate Models
Neil H. Timm Applied Multivariate Analysis With 42 Figures Springer Contents Preface Acknowledgments List of Tables List of Figures vii ix xix xxiii 1 Introduction 1 1.1 Overview 1 1.2 Multivariate Models
More informationBig Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs
More informationOpenCL Optimization. San Jose 10/2/2009 Peng Wang, NVIDIA
OpenCL Optimization San Jose 10/2/2009 Peng Wang, NVIDIA Outline Overview The CUDA architecture Memory optimization Execution configuration optimization Instruction optimization Summary Overall Optimization
More informationPerformance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries
Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Shin Morishima 1 and Hiroki Matsutani 1,2,3 1Keio University, 3 14 1 Hiyoshi, Kohoku ku, Yokohama, Japan 2National Institute
More informationHigh Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach
High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach Beniamino Di Martino, Antonio Esposito and Andrea Barbato Department of Industrial and Information Engineering Second University of Naples
More informationAnalysis and Optimization of a Hybrid Linear Equation Solver using Task-Based Parallel Programming Models
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Analysis and Optimization of a Hybrid Linear Equation Solver using Task-Based Parallel Programming Models Claudia Rosas,
More informationDesigning and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp
Designing and Building Applications for Extreme Scale Systems CS598 William Gropp www.cs.illinois.edu/~wgropp Welcome! Who am I? William (Bill) Gropp Professor of Computer Science One of the Creators of
More informationGrid Computing FUNDAMENTALS OF. Theory, Algorithms and Technologies. Frederic Magoules. Edited by. CRC Press
FUNDAMENTALS OF Grid Computing Theory, Algorithms and Technologies Edited by Frederic Magoules CRC Press Taylor & Francis Group Boca Raton London NewYork CRC Press is an imprint of the Taylor 8t Francis
More informationOverview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it
Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o igiro7o@ictp.it Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) Mul(ple Socket
More informationCSE 6040 Computing for Data Analytics: Methods and Tools
CSE 6040 Computing for Data Analytics: Methods and Tools Lecture 12 Computer Architecture Overview and Why it Matters DA KUANG, POLO CHAU GEORGIA TECH FALL 2014 Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS
More informationParallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel
Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:
More informationWhat s New in MATLAB and Simulink
What s New in MATLAB and Simulink Kevin Cohan Product Marketing, MATLAB Michael Carone Product Marketing, Simulink 2015 The MathWorks, Inc. 1 What was new for Simulink in R2012b? 2 What Was New for MATLAB
More informationAdvances in Network Management
Advances in Network Management Jianguo Ding UC) CRC Press >5^ J Taylor & Francis Croup ^""""^ Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Group, an informa business AN AUERBACH
More informationHardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui
Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching
More informationCOMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Big Data by the numbers
COMP 598 Applied Machine Learning Lecture 21: Parallelization methods for large-scale machine learning! Instructor: (jpineau@cs.mcgill.ca) TAs: Pierre-Luc Bacon (pbacon@cs.mcgill.ca) Ryan Lowe (ryan.lowe@mail.mcgill.ca)
More informationA Simulation-Based lntroduction Using Excel
Quantitative Finance A Simulation-Based lntroduction Using Excel Matt Davison University of Western Ontario London, Canada CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint
More informationMAPREDUCE Programming Model
CS 2510 COMPUTER OPERATING SYSTEMS Cloud Computing MAPREDUCE Dr. Taieb Znati Computer Science Department University of Pittsburgh MAPREDUCE Programming Model Scaling Data Intensive Application MapReduce
More informationPresto/Blockus: Towards Scalable R Data Analysis
/Blockus: Towards Scalable R Data Analysis Andrew A. Chien University of Chicago and Argonne ational Laboratory IRIA-UIUC-AL Joint Institute Potential Collaboration ovember 19, 2012 ovember 19, 2012 Andrew
More informationBringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the
More informationAccelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
More informationBig data in R EPIC 2015
Big data in R EPIC 2015 Big Data: the new 'The Future' In which Forbes magazine finds common ground with Nancy Krieger (for the first time ever?), by arguing the need for theory-driven analysis This future
More informationInformatica Ultra Messaging SMX Shared-Memory Transport
White Paper Informatica Ultra Messaging SMX Shared-Memory Transport Breaking the 100-Nanosecond Latency Barrier with Benchmark-Proven Performance This document contains Confidential, Proprietary and Trade
More informationGPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
More informationGildart Haase School of Computer Sciences and Engineering
Gildart Haase School of Computer Sciences and Engineering Metropolitan Campus I. Course: CSCI 6638 Operating Systems Semester: Fall 2014 Contact Hours: 3 Credits: 3 Class Hours: W 10:00AM 12:30 PM DH1153
More informationOptimizing Parallel Reduction in CUDA. Mark Harris NVIDIA Developer Technology
Optimizing Parallel Reduction in CUDA Mark Harris NVIDIA Developer Technology Parallel Reduction Common and important data parallel primitive Easy to implement in CUDA Harder to get it right Serves as
More informationCSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing
More informationMulti-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
More informationDevelopment and Management
Cloud Database Development and Management Lee Chao CRC Press Taylor & Francis Group Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Croup, an Informa business AN AUERBACH BOOK
More informationData Structures and Performance for Scientific Computing with Hadoop and Dumbo
Data Structures and Performance for Scientific Computing with Hadoop and Dumbo Austin R. Benson Computer Sciences Division, UC-Berkeley ICME, Stanford University May 15, 2012 1 1 Matrix storage 2 Data
More informationAmazon EC2 Product Details Page 1 of 5
Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Functionality Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of
More informationGPU for Scientific Computing. -Ali Saleh
1 GPU for Scientific Computing -Ali Saleh Contents Introduction What is GPU GPU for Scientific Computing K-Means Clustering K-nearest Neighbours When to use GPU and when not Commercial Programming GPU
More informationSeminarbeschreibung. Windows Server 2008 Developing High-performance Applications using Microsoft Windows HPC Server 2008.
Seminarbeschreibung Windows Server 2008 Developing High-performance Applications using Microsoft Windows HPC Server 2008 Einleitung: In dieser Schulung lernen die Entwickler - High-Performance Computing
More informationNetworking. Systems Design and. Development. CRC Press. Taylor & Francis Croup. Boca Raton London New York. CRC Press is an imprint of the
Networking Systems Design and Development Lee Chao CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Croup, an Informa business AN AUERBACH BOOK
More informationHigh Performance Matrix Inversion with Several GPUs
High Performance Matrix Inversion on a Multi-core Platform with Several GPUs Pablo Ezzatti 1, Enrique S. Quintana-Ortí 2 and Alfredo Remón 2 1 Centro de Cálculo-Instituto de Computación, Univ. de la República
More informationHPC Software Requirements to Support an HPC Cluster Supercomputer
HPC Software Requirements to Support an HPC Cluster Supercomputer Susan Kraus, Cray Cluster Solutions Software Product Manager Maria McLaughlin, Cray Cluster Solutions Product Marketing Cray Inc. WP-CCS-Software01-0417
More informationSoftware Performance and Scalability
Software Performance and Scalability A Quantitative Approach Henry H. Liu ^ IEEE )computer society WILEY A JOHN WILEY & SONS, INC., PUBLICATION Contents PREFACE ACKNOWLEDGMENTS xv xxi Introduction 1 Performance
More informationMap-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicore Chu, et al. Problem The world is going multicore New computers - dual core to 12+-core Shift to more concurrent programming paradigms and languages Erlang,
More informationApplications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61
F# Applications to Computational Financial and GPU Computing May 16th Dr. Daniel Egloff +41 44 520 01 17 +41 79 430 03 61 Today! Why care about F#? Just another fashion?! Three success stories! How Alea.cuBase
More informationPetascale Software Challenges. William Gropp www.cs.illinois.edu/~wgropp
Petascale Software Challenges William Gropp www.cs.illinois.edu/~wgropp Petascale Software Challenges Why should you care? What are they? Which are different from non-petascale? What has changed since
More informationIntroduction to Financial Models for Management and Planning
CHAPMAN &HALL/CRC FINANCE SERIES Introduction to Financial Models for Management and Planning James R. Morris University of Colorado, Denver U. S. A. John P. Daley University of Colorado, Denver U. S.
More informationLecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl
Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the
More informationData-Flow Awareness in Parallel Data Processing
Data-Flow Awareness in Parallel Data Processing D. Bednárek, J. Dokulil *, J. Yaghob, F. Zavoral Charles University Prague, Czech Republic * University of Vienna, Austria 6 th International Symposium on
More informationImage Analytics on Big Data In Motion Implementation of Image Analytics CCL in Apache Kafka and Storm
Image Analytics on Big Data In Motion Implementation of Image Analytics CCL in Apache Kafka and Storm Lokesh Babu Rao 1 C. Elayaraja 2 1PG Student, Dept. of ECE, Dhaanish Ahmed College of Engineering,
More information