MapGraph. A High Level API for Fast Development of High Performance Graphic Analytics on GPUs. http://mapgraph.io



Similar documents
SYSTAP / bigdata. Open Source High Performance Highly Available. 1 bigdata Presented to CSHALS 2/27/2014

Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model

HIGH PERFORMANCE BIG DATA ANALYTICS

Fast Iterative Graph Computation with Resource Aware Graph Parallel Abstraction

Big Data Analytics. Lucas Rego Drumond

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

LBM BASED FLOW SIMULATION USING GPU COMPUTING PROCESSOR

Parallel Programming Survey

Green-Marl: A DSL for Easy and Efficient Graph Analysis

Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries

NVIDIA Tools For Profiling And Monitoring. David Goodwin

NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Computer Graphics Hardware An Overview

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC

Parallel programming with Session Java

GPGPU Computing. Yong Cao

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU

The Evolution of Computer Graphics. SVP, Content & Technology, NVIDIA

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

L20: GPU Architecture and Models

Optimizing a 3D-FWT code in a cluster of CPUs+GPUs

Optimizing GPU-based application performance for the HP for the HP ProLiant SL390s G7 server

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

World s fastest database and big data analytics platform

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs

NVIDIA GRID OVERVIEW SERVER POWERED BY NVIDIA GRID. WHY GPUs FOR VIRTUAL DESKTOPS AND APPLICATIONS? WHAT IS A VIRTUAL DESKTOP?

In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps. Yu Su, Yi Wang, Gagan Agrawal The Ohio State University

HPC with Multicore and GPUs

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

Introduction to GP-GPUs. Advanced Computer Architectures, Cristina Silvano, Politecnico di Milano 1

Enhancing Cloud-based Servers by GPU/CPU Virtualization Management

Chapter 2 Parallel Architecture, Software And Performance

Big Graph Processing: Some Background

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

Real-Time Realistic Rendering. Michael Doggett Docent Department of Computer Science Lund university

Parallel Computing. Benson Muite. benson.

GPU Parallel Computing Architecture and CUDA Programming Model

Parallel Firewalls on General-Purpose Graphics Processing Units

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Le langage OCaml et la programmation des GPU

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

Introduction to GPU Programming Languages

HPC Wales Skills Academy Course Catalogue 2015

Cluster Monitoring and Management Tools RAJAT PHULL, NVIDIA SOFTWARE ENGINEER ROB TODD, NVIDIA SOFTWARE ENGINEER

CUDA in the Cloud Enabling HPC Workloads in OpenStack With special thanks to Andrew Younge (Indiana Univ.) and Massimo Bernaschi (IAC-CNR)

The SpiceC Parallel Programming System of Computer Systems

Replication-based Fault-tolerance for Large-scale Graph Processing

InfiniteGraph: The Distributed Graph Database

An Introduction to Parallel Computing/ Programming

Introduction GPU Hardware GPU Computing Today GPU Computing Example Outlook Summary. GPU Computing. Numerical Simulation - from Models to Software

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

Optimizing AAA Games for Mobile Platforms

Final Project Report. Trading Platform Server

Big Data Systems CS 5965/6965 FALL 2015

LA-UR- Title: Author(s): Intended for: Approved for public release; distribution is unlimited.

Turbomachinery CFD on many-core platforms experiences and strategies

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

E6895 Advanced Big Data Analytics Lecture 14:! NVIDIA GPU Examples and GPU on ios devices

Retour d expérience : portage d une application haute-performance vers un langage de haut niveau

Image Analytics on Big Data In Motion Implementation of Image Analytics CCL in Apache Kafka and Storm

Petascale Visualization: Approaches and Initial Results

Big Data Analytics. Lucas Rego Drumond

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

Jean-Pierre Panziera Teratec 2011

Embedded Systems: map to FPGA, GPU, CPU?

A1 and FARM scalable graph database on top of a transactional memory layer

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Writing Applications for the GPU Using the RapidMind Development Platform

High Performance Cloud: a MapReduce and GPGPU Based Hybrid Approach

CSE 564: Visualization. GPU Programming (First Steps) GPU Generations. Klaus Mueller. Computer Science Department Stony Brook University

LARGE-SCALE GRAPH PROCESSING IN THE BIG DATA WORLD. Dr. Buğra Gedik, Ph.D.

ACCELERATING SELECT WHERE AND SELECT JOIN QUERIES ON A GPU

Hardware design for ray tracing

Verteilte Systeme - Overview

FastForward I/O and Storage: ACG 6.6 Demonstration

OpenACC 2.0 and the PGI Accelerator Compilers

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Accelerating Intensity Layer Based Pencil Filter Algorithm using CUDA

Overview on Modern Accelerators and Programming Paradigms Ivan Giro7o

High Performance CUDA Accelerated Local Optimization in Traveling Salesman Problem

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

NVIDIA VIDEO ENCODER 5.0

Coping with Complexity: CPUs, GPUs and Real-world Applications

Presto/Blockus: Towards Scalable R Data Analysis

Introduction to GPU hardware and to CUDA

Convex Optimization for Big Data: Lecture 2: Frameworks for Big Data Analytics

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data

CUDA programming on NVIDIA GPUs

A survey on platforms for big data analytics

InDetail. InDetail Paper by Bloor Author Philip Howard Publish date December Blazegraph GPU

GPU for Scientific Computing. -Ali Saleh

GPU Architectures. A CPU Perspective. Data Parallelism: What is it, and how to exploit it? Workload characteristics

GPU Programming in Computer Vision

Retargeting PLAPACK to Clusters with Hardware Accelerators

Transcription:

MapGraph A High Level API for Fast Development of High Performance Graphic Analytics on GPUs http://mapgraph.io Zhisong Fu, Michael Personick and Bryan Thompson SYSTAP, LLC

Outline Motivations MapGraph overview Results Summary

Million Traversed Edges per Second GPUs A Game Changer for Graph Analytics? Graphs are everywhere in data, also getting bigger and bigger GPUs may be the technology that finally delivers real-time analytics on large graphs 10x flops over CPU 10x memory bandwidth This is a hard problem Irregular memory access Load imbalance Significant speed up over CPU on BFS [Merrill2013] Over 10x speedup over CPU 3500 3000 2500 2000 1500 1000 500 0 1 10 100 1000 10000 100000 Average Traversal Depth NVIDIA Tesla C2050 Multicore per socket Sequential

Low-level VS. High-level Low-level approach BFS: [Merrill2013] PageRank: [Duong2012] SSSP: [Davidson2014] High-level approach GraphLab [Low2012] Medusa [Zhong2013] Totem [Gharaibeh2013] Pros: High performance Cons: Difficulty to develop Reinvent the wheels Pros: High programmability Cons: Low Performance

MapGraph High-level graph processing framework High programmability: only C++ sequential GPU architecture Optimization techniques CUDA, OpenCL High performance Comparable to low-level approach

GAS Abstraction Gather

GAS Abstraction Gather Apply

GAS Abstraction Gather Scatter = Expand + Contract Apply

GAS Abstraction Frontier size > 0 Gather Scatter = Expand + Contract Apply

MapGraph Runtime Pipeline

MTEPS Experiment Datasets Dataset #vertices #edges Max Degree MTEPS (BFS) Webbase 1,000,005 3,105,536 23 514 Delaunay 2,097,152 6,291,408 4,700 154 Bitcoin 6,297,539 28,143,065 4,075,472 75 Wiki 3,566,907 45,030,389 7,061 821 Kron 1,048,576 89,239,674 131,505 1,871 2,000 1,800 1,600 1,400 1,200 1,000 800 600 400 200 0 1870.9 821.3 513.6 154.0 74.8 Webbase Delaunay Bitcoin Wiki Kron

Speedup Results: Compare to Other GPU implementations 100.00 MapGraph Speedups vs Other GPU Implementations 10.00 Medusa B40c 1.00 0.10 Webbase Delaunay Bitcoin Wiki Kron 12

Speedup BFS Results: Compare to GraphLab 1,000.00 MapGraph Speedup vs GraphLab (BFS) 100.00 10.00 GL-2 GL-4 GL-8 GL-12 MPG 1.00 0.10 Webbase Delaunay Bitcoin Wiki Kron 13

Speedup PageRank Results: Compare to GraphLab 100.00 MapGraph Speedup vs GraphLab (PR) 10.00 1.00 GL-2 GL-4 GL-8 GL-12 MPG 0.10 Webbase Delaunay Bitcoin Wiki Kron

MapGraph API Gather gatheroveredges gather_edge gather_sum gather_vertex Scatter = Expand + Contract expandoveredges Expand expand_vertex expand_edge Contract contract Apply apply

Example: PageRank Implementation Gather, Apply, Scatter phases User Data VertexType Gather Apply Expand gatheroveredges gather_edge gather_sum apply expandoveredges expand_vertex expand_edge float* d_ranks; int* d_num_out_edge; return GATHER_IN_EDGES; float nb_rank = d_dists[neighbor_id]; new_rank = nb_rank / d_num_out_edge[neighbor_id]; return left + right; float old_value = d_ranks[vertex_id]; float new_value = 0.15f + (1.0f - 0.15f) * gathervalue; changed = fabs(old_value new_value) >= 0.01f; d_dists[vertex_id] = new_value; return EXPAND_OUT_EDGES; return changed; frontier = neighbor_id;

Source vertex ids in-edges Future Work GPU cluster: 2D partitioning (aka vertex cuts) In collaboration with SCI Institute of the University of Utah Compute grid defined over virtual nodes. Patches assigned to virtual nodes based on source and target identifier of the edge. Topology, message and data compression target vertex ids out-edges

Summary MapGraph: high-level graph processing framework http://mapgraph.io High programmability: GAS abstraction Simple and flexible API High performance: Hybrid scheduling strategy Structure Of Arrays

Acknowledgement This work was (partially) funded by the DARPA XDATA program under AFRL Contract #FA8750-13-C- 0002. This work is also supported by the DARPA under Contract No. D14PC00029. Many thanks to Dr. Christopher White for the support.