Architecture Support for Big Data Analytics



Similar documents
Datacenters and Cloud Computing. Jia Rao Assistant Professor in CS

HP ProLiant BL660c Gen9 and Microsoft SQL Server 2014 technical brief

VP/GM, Data Center Processing Group. Copyright 2014 Cavium Inc.

Performance Tuning and Optimizing SQL Databases 2016

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics

Comparison of Windows IaaS Environments

GraySort on Apache Spark by Databricks

Understanding the Behavior of In-Memory Computing Workloads

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu

PART IV Performance oriented design, Performance testing, Performance tuning & Performance solutions. Outline. Performance oriented design

Capacity Management for Oracle Database Machine Exadata v2

Evaluating Task Scheduling in Hadoop-based Cloud Systems

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Infrastructure Matters: POWER8 vs. Xeon x86

Symmetric Multiprocessing

Delivering Quality in Software Performance and Scalability Testing

Introducing EEMBC Cloud and Big Data Server Benchmarks

IOS110. Virtualization 5/27/2014 1

SQL Server 2008 Performance and Scale

A SURVEY ON MAPREDUCE IN CLOUD COMPUTING

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Benchmarking Hadoop & HBase on Violin

Virtualization Performance on SGI UV 2000 using Red Hat Enterprise Linux 6.3 KVM

Technical Paper. Moving SAS Applications from a Physical to a Virtual VMware Environment

Performance Management for Cloudbased STC 2012

JBoss Data Grid Performance Study Comparing Java HotSpot to Azul Zing

DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE

Manjrasoft Market Oriented Cloud Computing Platform


FPGA-based Multithreading for In-Memory Hash Joins

Contents Introduction... 5 Deployment Considerations... 9 Deployment Architectures... 11

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

SQL Server 2012 Performance White Paper

Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers

Bringing Big Data Modelling into the Hands of Domain Experts

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

Load Testing Analysis Services Gerhard Brückl

Hadoop Size does Hadoop Summit 2013

Estimate Performance and Capacity Requirements for Workflow in SharePoint Server 2010

Enabling High performance Big Data platform with RDMA

Performance Testing of Big Data Applications

SAS Business Analytics. Base SAS for SAS 9.2

Practical Performance Understanding the Performance of Your Application

Scalable Run-Time Correlation Engine for Monitoring in a Cloud Computing Environment

In-Memory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller

SAP BusinessObjects BI4 Sizing What You Need to Know

Hadoop Cluster Applications

Advances in Virtualization In Support of In-Memory Big Data Applications

Performance brief for IBM WebSphere Application Server 7.0 with VMware ESX 4.0 on HP ProLiant DL380 G6 server

Energy Efficient MapReduce

Application of Predictive Analytics for Better Alignment of Business and IT

MID-TIER DEPLOYMENT KB

Performance Analysis of Web based Applications on Single and Multi Core Servers

TBR. IBM x86 Servers in the Cloud: Serving the Cloud. February 2012

DEPLOYING VIRTUALIZED MICROSOFT DYNAMICS AX 2012 R2

Duke University

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

Big Data Performance Growth on the Rise

BPOE Research Highlights

Resource Aware Scheduler for Storm. Software Design Document. Date: 09/18/2015

Big Fast Data Hadoop acceleration with Flash. June 2013

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

Performance Analysis: Benchmarking Public Clouds

Tuning Your GlassFish Performance Tips. Deep Singh Enterprise Java Performance Team Sun Microsystems, Inc.

GeoGrid Project and Experiences with Hadoop

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Agility Database Scalability Testing

ioscale: The Holy Grail for Hyperscale

Garbage Collection in the Java HotSpot Virtual Machine

MySQL performance in a cloud. Mark Callaghan

Lecture 4. Parallel Programming II. Homework & Reading. Page 1. Projects handout On Friday Form teams, groups of two

Performance Best Practices for Oracle Enterprise Service Bus and Advanced Queueing

Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware

Presto/Blockus: Towards Scalable R Data Analysis

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda

Hybrid Software Architectures for Big

Multi-core Programming System Overview

SQL Server Performance Tuning and Optimization

Tuning WebSphere Application Server ND 7.0. Royal Cyber Inc.

LeiWang, Jianfeng Zhan, ZhenJia, RuiHan

White Paper. Cloud Native Advantage: Multi-Tenant, Shared Container PaaS. Version 1.1 (June 19, 2012)

Graph Database Proof of Concept Report

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Hadoop & Spark Using Amazon EMR

Transcription:

Architecture Support for Big Data Analytics Ahsan Javed Awan EMJD-DC (KTH-UPC) (http://uk.linkedin.com/in/ahsanjavedawan/) Supervisors: Mats Brorsson(KTH), Eduard Ayguade(UPC), Vladimir Vlassov(KTH) 1

Why should we care? Motivation 2 *Source: Babak Falsafi slides

Cont... Motivation A mismatch between the characteristics of emerging workloads and the underlying hardware. M. Ferdman et-al, Clearing the clouds: A study of emerging scale-out workloads on modern hardware, in ASPLOS 2012. Z. Jia, et-al Characterizing data analysis workloads in data centers, in IISWC 2013. C. Zheng et-al, Characterizing os behavior of scale-out data center workloads. in WIVOSCA 2013. M. Dimitrov et al, Memory system characterization of big data workloads, in BigData Conference, 2013. Z. Jia et-al, Characterizing and subsetting big data workloads, in IISWC 2014 A. Yasin et-al, Deep-dive analysis of the data analytics workload in cloudsuite, in IISWC 2014. L. Wang et-al, Bigdatabench: A big data benchmark suite from internet services, in HPCA, 2014. T. Jiang, et-al, Understanding the behavior of in-memory computing workloads, in IISWC 2014 3

Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server Motivation 4 *Source: SGI

Cont... Motivation Hadoop, Spark, Flink, etc.. Phoenix ++, Metis, Ostrich, etc.. Our OurFocus Focus Improve the single node performance in scale-out configuration 5 *Source: http://navcode.info/2012/12/24/cloud-scaling-schemes/

Which Scale-out Framework? Progress Meeting 12-12-14 [Picture Courtesy: Amir H. Payberah] 6

Methodology Our Approach A three fold analysis method at Application, Thread and Microarchitectural level Tuning of Spark internal Parameters Tuning of JVM Parameters (Heap size etc..) Concurrency Analysis General Architectural Exploration 7

Benchmarks Our Approach 3GB of Wikipedia raw datasets, Amazon Movies Reviews and numerical records have been used 8

Machine Details Our Hardware Configuration Hyper Threading and Turbo Boost is disabled Hyper Threading and Turbo-boost are disabled 9

System Configuration Our Approach 10

Application Level Performance Multicore Scalability of Spark Spark scales poorly in Scale-up configuration 11

Stage Level Performance Multicore Scalability of Spark Shuffle Map Stages don't scale beyond 12 threads across different workloads No of concurrent files open in Map-side shuffling is C*R where C is no of threads in executor pool and R is no of reduce tasks 12

Task Level Performance Multicore Scalability of Spark Percentage increase in Area Under the Curve compared to 1-thread 13

Is there thread level load imbalance?? 14

CPU Utilization is not scaling with performance 15

Is there any Work Time Inflation?? 16

How does Micro-architecture contribute to Work time inflation?? 17

Cont... 18

Cont... 19

Is Memory Bandwidth a bottleneck?? 20

Key Findings More than 12 threads in an executor pool does not yield significant performance Spark runtime system need to be improved to provide better load balancing and avoid work-time inflation. Work time inflation and load imbalance on the threads are the scalability bottlenecks. Removing the bottlenecks in the front-end of the processor would not remove more than 20% of stalls. Effort should be focused on removing the memory bound stalls since they account for up to 72% of stalls in the pipeline slots. Memory bandwidth of current processors is sufficient for inmemory data analytics 21

How Data Volume Affects Spark Based Data Analytics on a Scale-up Server Motivation 22

Do Spark based data analytics benefit from using larger scale-up servers Motivation 23

Is GC detrimental to scalability of Spark applications? Motivation 24

How does performance scale with data volume? Motivation 25

Does GC time scale linearly with Data Volume?? Motivation 26

How does CPU utilization scale with data volume? Motivation 27

Is File I/O detrimental to performance? Motivation 28

How does data size affects micro-architectural performance? Motivation 29

How Data Volume Affects Spark Based Data Analytics on a Scale-up Server Motivation 30

Cont.. Motivation 31

Cont.. Motivation 32

Key Findings Spark workloads do not benefit significantly from executors with more than 12 cores. The performance of Spark workloads degrades with large volumes of data due to substantial increase in garbage collection and file I/O time. With out any tuning, Parallel Scavenge garbage collection scheme outperforms Concurrent Mark Sweep and G1 garbage collectors for Spark workloads. Spark workloads exhibit improved instruction retirement due to lower L1 cache misses and better utilization of functional units inside cores at large volumes of data. Memory bandwidth utilization of Spark benchmarks decreases with large volumes of data and is 3x lower than the available offchip bandwidth on our test machine 33

What are the major bottlenecks?? Our Approach A. J. Awan, M. Brorsson, V. Vlassov, and E. Ayguade, Performance charaterization of in-memory data analytics on a mordern cloud server, in 5th International IEEE Conference on Big Data and Cloud Computing, 2015. A. J. Awan, M. Brorsson, V. Vlassov, and E. Ayguade, How Data Volume Affects Spark Based Data Analytics on a Scale-up Server, in 6th International Workshop on Big Data Benchmarking, Performance Optimization and Emerging Hardware (BpoE) held in conjunction with VLDB, 2015. 34

Future Directions Motivation NUMA Aware Task Scheduling Cache Aware Transformations Exploiting Processing In Memory Architectures HW/SW Data Prefectching Rethinking Memory Architectures 35