HIGH PERFORMANCE BIG DATA ANALYTICS

Similar documents

Accelerating Big Data Analytics with Domain Specific Languages Kunle Olukotun Cadence Design Systems Professor Stanford University

Green-Marl: A DSL for Easy and Efficient Graph Analysis

Spark and the Big Data Library

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader

Big Data Analytics. Lucas Rego Drumond

Parallel Computing for Data Science

Big Data Analytics in the Age of Accelerators Kunle Olukotun Cadence Design Systems Professor Stanford University

Programming models for heterogeneous computing. Manuel Ujaldón Nvidia CUDA Fellow and A/Prof. Computer Architecture Department University of Malaga

Graph Processing and Social Networks

Spark. Fast, Interactive, Language- Integrated Cluster Computing

MapGraph. A High Level API for Fast Development of High Performance Graphic Analytics on GPUs.

Unified Big Data Processing with Apache Spark. Matei

@Scalding. Based on talk by Oscar Boykin / Twitter

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

SYSTAP / bigdata. Open Source High Performance Highly Available. 1 bigdata Presented to CSHALS 2/27/2014

Large-Scale Data Processing

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Map-Reduce for Machine Learning on Multicore

Bringing Big Data Modelling into the Hands of Domain Experts

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Challenges for Data Driven Systems

Machine Learning over Big Data

Spark: Cluster Computing with Working Sets

Systems Engineering II. Pramod Bhatotia TU Dresden dresden.de

Big Data and Apache Hadoop s MapReduce

Hadoop Ecosystem B Y R A H I M A.

Information Processing, Big Data, and the Cloud

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

Large Scale Social Network Analysis

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Big Graph Analytics on Neo4j with Apache Spark. Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage

A survey on platforms for big data analytics

The Internet of Things and Big Data: Intro

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Architectures for Big Data Analytics A database perspective

The Julia Language Seminar Talk. Francisco Vidal Meca

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Overview on Graph Datastores and Graph Computing Systems. -- Litao Deng (Cloud Computing Group)

CSE 427 CLOUD COMPUTING WITH BIG DATA APPLICATIONS

Presto/Blockus: Towards Scalable R Data Analysis

PARALLEL JAVASCRIPT. Norm Rubin (NVIDIA) Jin Wang (Georgia School of Technology)

Big Data looks Tiny from the Stratosphere

Unified Big Data Analytics Pipeline. 连城

Big Data and Data Science: Behind the Buzz Words

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy?

Analytics on Big Data

SURVEY REPORT DATA SCIENCE SOCIETY 2014

BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky

CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University

Big Graph Processing: Some Background

Apache Flink Next-gen data analysis. Kostas

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Reference Architecture, Requirements, Gaps, Roles

Architectures for massive data management

MapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12

Parallel Simplification of Large Meshes on PC Clusters

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Scalable Machine Learning - or what to do with all that Big Data infrastructure

HUAWEI Advanced Data Science with Spark Streaming. Albert Bifet

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Large-Scale Test Mining

HiBench Introduction. Carson Wang Software & Services Group

Ringo: Interactive Graph Analytics on Big-Memory Machines

Le langage OCaml et la programmation des GPU

Spark: Making Big Data Interactive & Real-Time

BSC vision on Big Data and extreme scale computing

Big Data and Scripting Systems build on top of Hadoop

Graph Analytics in Big Data. John Feo Pacific Northwest National Laboratory

CSE 373: Data Structure & Algorithms Lecture 25: Programming Languages. Nicki Dell Spring 2014

Introduction to GPU Programming Languages

Applications to Computational Financial and GPU Computing. May 16th. Dr. Daniel Egloff

HPC Wales Skills Academy Course Catalogue 2015

Moving Beyond CPUs in the Cloud: Will FPGAs Sink or Swim?

Big Data and Scripting Systems beyond Hadoop

MLlib: Scalable Machine Learning on Spark

Scaling Out With Apache Spark. DTL Meeting Slides based on

Hadoop SNS. renren.com. Saturday, December 3, 11

GPUs for Scientific Computing

Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team

Ali Ghodsi Head of PM and Engineering Databricks

Parallel Programming Map-Reduce. Needless to Say, We Need Machine Learning for Big Data

Mining Large Datasets: Case of Mining Graph Data in the Cloud

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

CUDAMat: a CUDA-based matrix class for Python

Chapter 13: Program Development and Programming Languages

Domains are first-class index sets Specify the size and shape of arrays Support iteration, array operations, etc.

Analyzing Big Data at. Web 2.0 Expo, 2010 Kevin

Fast and Expressive Big Data Analytics with Python. Matei Zaharia UC BERKELEY

RevoScaleR Speed and Scalability

How To Scale Out Of A Nosql Database

BIG DATA AND ANALYTICS

Parallel programming with Session Java

Transcription:

HIGH PERFORMANCE BIG DATA ANALYTICS Kunle Olukotun Electrical Engineering and Computer Science Stanford University June 2, 2014

Explosion of Data Sources Sensors DoD is swimming in sensors and drowning in data Challenge: enable discovery Deliver the capability to mine, search and analyze this data in near real time

PROCESSING SYSTEMS FOR BIG DATA Goal: Make high-performance data analytics easy to use Manipulate big data sets in real time Make better decisions, the right conclusions Streaming data Low latency computation Interactive data exploration Requires the full power of modern computing platforms Heterogeneous parallelism

Processing PERFORMANCE TODAY Execution Time Relative to C++ Benchmarks: fib, parse_int, quicksort, mandel, pi_sum, rand_mat_stat, and rand_mat_mul

PERFORMANCE Vs. EASE OF USE Matlab, R Goal Ease of use Pig, Hive MapReduce Spark GraphLab Performance

Heterogeneous parallelism Graphics Processing Unit (GPU) Multicore CPU Cluster Programmable Logic

Expert PARALLEL programming CUDA OpenCL Graphics Processing Unit (GPU) Threads OpenMP Multicore CPU MPI MapReduce Cluster Verilog VHDL Programmable Logic

THe programmability GAP Applications Heterogeneous Hardware Data Wrangling Data Transformation Graph Analysis Prediction Recommendation New Arch. 8

general-purpose languages Applications Data Wrangling Data Transformation Graph Analysis Prediction Recommendation Not enough semantic knowledge to compile automatically No restrictions Scala Java Python C++ Ruby Clojure Heterogeneous Hardware New Arch.

Domain Specific Languages Domain Specific Languages (s) Programming language with restricted expressiveness for a particular domain High-level, usually declarative, and deterministic

High Performance s for Data Analytics Applications Data Wrangling Data Transformation Graph Analysis Prediction Recommendation Domain Specific Languages Data Transform OptiWrangle Data Query OptiQL Graph Alg. OptiGraph Machine Learning OptiML Convex Opt. OptiCVX Heterogeneous Hardware New Arch.

Scaling the Approach Many potential s How do we quickly create high-performance implementations for s we care about? Enable expert parallel programmers to easily create new s Make optimization knowledge reusable Simplify the compiler generation process A few developers enable many more users Leave expert programming to experts!

Delite: dsl infrastructure Applications Data Wrangling Data Transformation Graph Analysis Prediction Recommendation Domain Specific Languages Data Transform OptiWrangle Data Query OptiQL Graph Alg. OptiGraph Machine Learning OptiML Convex Opt. OptiCVX Delite Common Infrastructure Heterogeneous Hardware New Arch.

Delite: dsl infrastructure Applications Data Transformation Graph Analysis Prediction Recommendation Domain Specific Languages Data Transform OptiWrangle Data Query OptiQL Graph Alg. OptiGraph Machine Learning OptiML Delite Framework Heterogeneous Hardware Multicore GPU FPGA Cluster

Delite Overview Domain specific analyses & transforma?ons Generic analyses & transforma?ons Op?{CVX, Graph, ML, QL, Wrangle} domain data parallel data Code generators domain ops Parallel patterns Threads OpenMP CUDA OpenCL MPI Verilog User Developer Delite Framework Key elements Scala libraries on steroids Intermediate representation Domain specific optimization General parallelism and locality optimizations Mapping to HW targets

optiml: a Dsl for machine learning Designed for Iterative Statistical Inference e.g. SVMs, logistic regression, neural networks, etc. Dense/sparse vectors and matrices, message-passing graphs, training/test sets Mostly Functional Data manipulation with classic functional operators (map, filter) and ML-specific ones (sum, vector constructor, untilconverged) Math with MATLAB-like syntax (a*b, chol(..), exp(..)) Runs Anywhere Single source to multicore CPUs, GPUs, and clusters

K-MEANs CLUSTERING PERFORMANCE Op,ML Parallelized MATLAB C++ Speedup untilconverged(kmeans, 12.0 tol){kmeans => val clusters = samples.grouprowsby { sample => val alldistances = kmeans.maprows { mean => dist(sample, mean) } alldistances.minindex 10.0 8.0 6.0 } val newkmeans = clusters.map(e => e.sum / e.length) 4.0 newkmeans } 2.0 0.0 1.0 0.3 1.6 1.9 0.3 3.6 0.3 1 CPU 2 CPU 4 CPU 8 CPU CPU + GPU K- means 5.2 0.3 10.8 0.3

MSM Builder Using OptiML with Vijay Pande Markov State Models (MSMs) MSMs are a powerful means of modeling the structure and dynamics of molecular systems, like proteins high prod, high perf low prod, high perf x86 ASM high prod, low perf!

OptiGraph A for large-scale graph analysis based on Green-Marl A for Real-world Graph Analysis Green-Marl: A for Easy and Efficient Graph Analysis (Hong et. al.), ASPLOS 12 Functional No mutation, no explicit iteration Data structures Graph (directed, undirected), node, edge, Set of nodes, edges, neighbors, Graph traversals Summation, Breadth-first traversal, Parallel reductions but no deferred assignment Example applications written in framework Betweeness Centrality Directed/Undirected Triangle Counting PageRank

Example: Betweenness Centrality Betweenness Centrality A measure that tells how central a node is in the graph Used in social network analysis Definition How many shortest paths are there between any two nodes going through this node. Low BC High BC Kevin Bacon Ayush K. Kehdekar [Image source; Wikipedia]

OptiGraph Betweenness Centrality 1. val bc = sum(g.nodes){ s => 2. val sigma = g.inbforder(s) { (v,prev_sigma) => if(v==s) 1 3. else sum(v.upnbrs){ w => prev_sigma(w)} 4. } 5. val delta = g.inrevbforder(s){ (v,prev_delta) => 6. sum(v.downnbrs){ w => 7. sigma(v)/sigma(w)*(1+prev_delta(w)) 8. } 9. } 10. delta 11. }

OptiGraph: Page Rank 1. val pr = untilconverged(0,threshold){ oldpr => 2. g.mapnodes{ n => 3. ((1.0- damp)/g.numnodes) + damp*sum(n.innbrs){ w => 4. oldpr(w)/w.outdegree 5. } 6. } 7. }{(curpr,oldpr) => sum(abs(curpr- oldpr))}

OptiGraph vs. GPS (Pregel) Higher Productivity Algorithm OptiGraph (lines of code) Native GPS (lines of code) Average Teenage Follower (AvgTeen) 13 130 PageRank 11 110 Conductance (Conduct) 12 149 Single Source Shortest Paths (SSSP) 29 105 Random Bipartite Matching (Bipartite) 47 225 Approximate Betweeness Centrality 25 Not Available compiler automatically converts OptiGraph to GPS

(Pregel) Similar Performance 25 x 4 = 100 cores OptiGraph execution time relative to native GPS

TRIANGLE COUNTING Microsoft Research, IBM, Logic Blox, MIT, and Oracle are trying to make this application run fast Simple yet practical example of multi-way join a fundamental operation to any data analytics engine Serves as the building block to many graph mining applications such as identifying cliques, graph transitivity, and clustering coefficients 1. val trianglecount = g.sumovernodes{ n => 2. sum(n.nbrs){nbr => n.commonnbrs(nbr).size } 3. }

KeyS to TRIANGLE COUNTING PERFORMANCE Data layout Hash Compressed sparse row Bit set Compressed bit set Dynamically switch based on graph characteristics E.g. mesh vs. scale free (power law) Mixed layouts are best in some cases Dynamic scheduling for load balance Performance range: 2x 160x better than Graphlab