Graph Analytics in Big Data. John Feo Pacific Northwest National Laboratory



Similar documents
STINGER: High Performance Data Structure for Streaming Graphs

GPU Computing with CUDA Lecture 4 - Optimizations. Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile

Multi-Threading Performance on Commodity Multi-Core Processors

The Uintah Framework: A Unified Heterogeneous Task Scheduling and Runtime System

Cray: Enabling Real-Time Discovery in Big Data

Final Project Report. Trading Platform Server

bigdata Managing Scale in Ontological Systems

- An Essential Building Block for Stable and Reliable Compute Clusters

In-Memory Computing for Iterative CPU-intensive Calculations in Financial Industry In-Memory Computing Summit 2015

A1 and FARM scalable graph database on top of a transactional memory layer

ZooKeeper. Table of contents

GEMS: Graph database Engine for Multithreaded Systems

Energy Efficient MapReduce

Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU

Parallel Algorithm Engineering

Parallel Computing with MATLAB

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

A Network Management Framework for Emerging Telecommunications Network.

Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest

The International Journal Of Science & Technoledge (ISSN X)

Cost-Benefit Analysis of Cloud Computing versus Desktop Grids

Graph Database Proof of Concept Report

Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand

Informatica Ultra Messaging SMX Shared-Memory Transport

Analysis of MapReduce Algorithms

Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy?

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Microsoft Windows Server 2003 with Internet Information Services (IIS) 6.0 vs. Linux Competitive Web Server Performance Comparison

HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK

How To Compare Amazon Ec2 To A Supercomputer For Scientific Applications

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team

Scaling Database Performance in Azure

So#ware Tools and Techniques for HPC, Clouds, and Server- Class SoCs Ron Brightwell

Efficient Parallel Processing on Public Cloud Servers Using Load Balancing

FPGA-based Multithreading for In-Memory Hash Joins

Next Generation GPU Architecture Code-named Fermi

Lecture 36: Chapter 6

TRACE PERFORMANCE TESTING APPROACH. Overview. Approach. Flow. Attributes

Performance Characteristics of Large SMP Machines

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Parallel Simplification of Large Meshes on PC Clusters

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Big Data Database Revenue and Market Forecast,

MapReduce and Hadoop Distributed File System

Agenda. Distributed System Structures. Why Distributed Systems? Motivation

Volunteer Computing, Grid Computing and Cloud Computing: Opportunities for Synergy. Derrick Kondo INRIA, France

Big Graph Processing: Some Background

White Paper February IBM InfoSphere DataStage Performance and Scalability Benchmark Whitepaper Data Warehousing Scenario

IBM CELL CELL INTRODUCTION. Project made by: Origgi Alessandro matr Teruzzi Roberto matr IBM CELL. Politecnico di Milano Como Campus

Scalability and Classifications

Tyche: An efficient Ethernet-based protocol for converged networked storage

CentOS Linux 5.2 and Apache 2.2 vs. Microsoft Windows Web Server 2008 and IIS 7.0 when Serving Static and PHP Content

JBoss Data Grid Performance Study Comparing Java HotSpot to Azul Zing

SCALABILITY AND AVAILABILITY

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

CHAPTER 1 INTRODUCTION

IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Hyper-V Server Agent Version Fix Pack 2.

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB

Accelerating Spark with RDMA for Big Data Processing: Early Experiences

Apache Hadoop. Alexandru Costan

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader

CS 575 Parallel Processing

Big Data and Graph Analytics in a Health Care Setting

Virtual Network Provisioning and Fault-Management across Multiple Domains

Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details

Lesson Objectives. To provide a grand tour of the major operating systems components To provide coverage of basic computer system organization

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Cluster Computing at HRI

MapReduce and Hadoop Distributed File System V I J A Y R A O

benchmarking Amazon EC2 for high-performance scientific computing

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Performance Analysis and Optimization Tool

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

HIGH PERFORMANCE BIG DATA ANALYTICS

Understanding the Benefits of IBM SPSS Statistics Server

Stream Processing on GPUs Using Distributed Multimedia Middleware

Hadoop Cluster Applications

ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008

BSPCloud: A Hybrid Programming Library for Cloud Computing *

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Chapter 4 Cloud Computing Applications and Paradigms. Cloud Computing: Theory and Practice. 1

Big Data Challenges in Bioinformatics

1 Bull, 2011 Bull Extreme Computing

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Transcription:

Graph Analytics in Big Data John Feo Pacific Northwest National Laboratory 1

A changing World The breadth of problems requiring graph analytics is growing rapidly Large Network Systems Social Networks Packet Inspection Natural Language Understanding Semantic Search and Knowledge Discovery CyberSecurity

Graphs are not grids Graphs arising in informatics are very different from the grids used in scientific computing Scientific Grids Graphs for Data Informatics Static or slowly involving Planar Nearest neighbor communication Work performed per cell or node Work modifies local data Dynamic Non-planar Communications are non-local and dynamic Work performed by crawlers or autonomous agents Work modifies data in many places

Challenges Problem size Ton of bytes, not ton of flops Little data locality Have only parallelism to tolerate latencies Low computation to communication ratio Single word access Threads limited by loads and stores Frequent synchronization Node, edge, record Work tends to be dynamic and imbalanced Let any processor execute any thread

System requirements Global shared memory No simple data partitions Local storage for thread private data Network support for single word accesses Transfer multiple words when locality exists Multi-threaded processors Hide latency with parallelism Single cycle context switching Multiple outstanding loads and stores per thread Full-and-empty bits Efficient synchronization Wait in memory Message driven operations Dynamic work queues Hardware support for thread migration Cray XMT

Our type of problems Return all triads such that (A B), (A C), (C B) Return all three paths with link types {T 1, T 2, T 3 } such that the timestamps of consecutive links overlap by at least 0.5 seconds. From Facebook, return the connected subgraph G(V, E) such that G includes all the friends of John, the cardinality of V is minimum, and Σ NetWorth(v i ε V) is maximum.

Triads B C A SELECT?A?B?C WHERE {?A?a?B.?A?b?C.?C?c?B. } A B A C A B C B C

Simple C code!?!?! for each node A { for each out_edge I of A { for each out_edge J of A { B = tail of I; C = tail of J; No memory explosion 15 secs } } } for each out_edge K of C if tail of K == B { write answer } 8

SP2 Benchmarks We have written the 12 SP2B queries in C using our graph API Execution time on Cray XMT/2 is from one to three orders magnitude faster than Virtuoso on 3GHz Xeon server Now porting sdb0 to x86 server and cluster systems C code is simple, but Can we generate it automatically from a high level query language? Can we provide some other more appropriate query interface? 9

Query 5 Return the names of all persons that occur as author of at least one inproceeding and at least one article John pub 1 ARTICLE PERSON pub 2 Bill pub 3 InPROC

Data parallel code for Query 5 int PERSON_index = get_vertex_index(person); int ARTICLE_index = get_vertex_index(article); int INPROC_index = get_vertex_index(inproc); int nmbr_edges = indegree(person_index); in_edge_iterator Person_edges = get_inedges(person_index); for (i = 0; i < nmbr_edges; i++) { int person = PERSON_edges[i].head; int nmbr_publ = number_edges(person, creator); in_edge_iterator Publ_edges = get_inedges(person, creator); for (j = 0; j < nmbr_publ; j++) { int publ_type = edge_head_index(publ_edges[j]); if (publ_type == ARTICLE_index) flag = 1; else if (publ_type == INPROC_index) flag = 2; } } if (flag == 3) {print person; break;} 1.29 secs vs. 21 secs in Virtuoso 11

Conclusions Big data graph analytics is fundamentally different than big data science Different algorithms Different challenges Different hardware requirements Conventional database systems based tables and join operations are insufficient Data parallel graph crawls can be orders of magnitude faster Need new query languages capable of expressing graph analytics operations and compiling to data parallel operations 12