Data-Flow Awareness in Parallel Data Processing



Similar documents
Task Scheduling in Data Stream Processing. Task Scheduling in Data Stream Processing

The Design and Implementation of Scalable Parallel Haskell

Data Store Interface Design and Implementation

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

Parallel Computing. Benson Muite. benson.

MapReduce and Hadoop Distributed File System V I J A Y R A O

Parallel Processing and Software Performance. Lukáš Marek

Scheduling Task Parallelism" on Multi-Socket Multicore Systems"

Analysis and Modeling of MapReduce s Performance on Hadoop YARN

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader

Benchmark Study on Distributed XML Filtering Using Hadoop Distribution Environment. Sanjay Kulhari, Jian Wen UC Riverside

Hadoop and Map-Reduce. Swati Gore

Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy?

Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce

Scalability evaluation of barrier algorithms for OpenMP

Developing MapReduce Programs

MapReduce and Hadoop Distributed File System

A Pattern-Based Approach to. Automated Application Performance Analysis

Spring 2011 Prof. Hyesoon Kim

Big Data and Transactional Databases Exploding Data Volume is Creating New Stresses on Traditional Transactional Databases

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Parallel Programming Survey

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect

Parallel Computing for Data Science

Managing the Data Deluge

Systems Engineering II. Pramod Bhatotia TU Dresden dresden.de

Performance analysis with Periscope

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

DBpedia German: Extensions and Applications

Snapshots in Hadoop Distributed File System

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

Part I Courses Syllabus

Module 14: Scalability and High Availability

Recent and Future Activities in HPC and Scientific Data Management Siegfried Benkner

Software design ideas for SoLID

Informatica Ultra Messaging SMX Shared-Memory Transport

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

An efficient Join-Engine to the SQL query based on Hive with Hbase Zhao zhi-cheng & Jiang Yi

MIDeA: A Multi-Parallel Intrusion Detection Architecture

Beyond Embarrassingly Parallel Big Data. William Gropp

Performance Characteristics of Large SMP Machines

Operatin g Systems: Internals and Design Principle s. Chapter 10 Multiprocessor and Real-Time Scheduling Seventh Edition By William Stallings

Load Balancing MPI Algorithm for High Throughput Applications

Big Systems, Big Data

How To Get A Computer Science Degree At Appalachian State

Play with Big Data on the Shoulders of Open Source

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

GeoGrid Project and Experiences with Hadoop

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook

Towards Relaxing STM. Christoph M. Kirsch, Michael Lippautz! University of Salzburg. Euro-TM Workshop, April 2014

CSE-E5430 Scalable Cloud Computing Lecture 2

Stream Processing on GPUs Using Distributed Multimedia Middleware

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

A Brief Introduction to Apache Tez

Distributed Computing and Big Data: Hadoop and MapReduce

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Augmented Search for IT Data Analytics. New frontier in big log data analysis and application intelligence

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

Professor and Director, Division of Biological Sciences & Computation Institute, University of Chicago

<Insert Picture Here> Oracle and/or Hadoop And what you need to know

Data Modeling for Big Data

Multi-GPU Load Balancing for Simulation and Rendering

Rule-Based Program Transformation for Hybrid Architectures CSW Workshop Towards Portable Libraries for Hybrid Systems

Big Data and Apache Hadoop s MapReduce

Architecture of Hitachi SR-8000

Cloud Computing. and Scheduling. Data-Intensive Computing. Frederic Magoules, Jie Pan, and Fei Teng SILKQH. CRC Press. Taylor & Francis Group

Multicore Parallel Computing with OpenMP

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Big-data Analytics: Challenges and Opportunities

Big Data Are You Ready? Thomas Kyte

PERFORMANCE MODELS FOR APACHE ACCUMULO:

Architecture Support for Big Data Analytics

Intro to GPU computing. Spring 2015 Mark Silberstein, , Technion 1

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

SYSTAP / bigdata. Open Source High Performance Highly Available. 1 bigdata Presented to CSHALS 2/27/2014

Performance Counters. Microsoft SQL. Technical Data Sheet. Overview:

Introduction to GPU Programming Languages

Testing 3Vs (Volume, Variety and Velocity) of Big Data

CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies

~ Greetings from WSU CAPPLab ~

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Big Data Processing with Google s MapReduce. Alexandru Costan

Big Data Challenges in Bioinformatics

BEYOND MAP REDUCE: THE NEXT GENERATION OF BIG DATA ANALYTICS

Fault Tolerance in Hadoop for Work Migration

CS 575 Parallel Processing

A Locality Approach to Architecture-aware Task-scheduling in OpenMP

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Introduction to DISC and Hadoop

Building an energy dashboard. Energy measurement and visualization in current HPC systems

GeoImaging Accelerator Pansharp Test Results

BSPCloud: A Hybrid Programming Library for Cloud Computing *

Kafka & Redis for Big Data Solutions

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Transcription:

Data-Flow Awareness in Parallel Data Processing D. Bednárek, J. Dokulil *, J. Yaghob, F. Zavoral Charles University Prague, Czech Republic * University of Vienna, Austria 6 th International Symposium on Intelligent Distributed Computing September 24-26, 2012, Calabria

Problem Processing large data collections (GB, TB) unstructured data, BigData data streams Cloud Computing Hadoop Map-Reduce paradigm not sufficient for nonlinear computations matrix inversion, fluid dynamics equations, data joins

Problem Parallel processing Threading Building Blocks shared memory parallel processing OpenMP mathematical computations MPI distributed computing Significant knowledge required synchronization, cache hierarchy, technical details

Bobox Framework for parallel programming Main goal: simplify writing parallel programs Targeted to data-intensive applications Technical details handled by the framework Data connection Box computing unit

Data flow awareness Data-flow awareness in parallel data processing Explicit specification of data flow part of the application algorithm gives the task scheduler more precise info improves the quality of task scheduling better performance

Scheduler Scheduler - selection of the next task TBB, OpenMP, or MPI no information on data flow difficult to optimize their strategy Bobox model abstraction of data flow known to the scheduler decisions are data-flow aware box - basic computational component zero or more inputs and outputs in the pipeline via - one link in the pipeline connects one output to one or more inputs of other boxes envelope - the smallest unit of data

Scheduling in Bobox Fixed number of threads Separate scheduler for each thread one central task scheduler would create a bottleneck immediate queue tasks to be scheduled immediately data are hot in cache stealing queue tasks which are not tightly bound to the thread Scheduling algorithm execute the first task in the immediate queue if empty - head of the stealing queue if empty - steals tasks from another scheduler

Task enqueueing Task enqueueing immediate task - head of the immediate queue relaxed task - end of the stealing queue hard-wired into a box and via framework Main reason: cache-awareness of the framework head of the immediate queue should be scheduled immediately by the same thread on the same CPU the data bound to the task is hot in the cache number of memory accesses is minimized relaxed task is placed on the tail since the data are not needed to be hot

Scheduling example

1. start Bobox model - pipeline 4 threads, task queues initial box is enqueued

2. initialization finished box produces data sends it to the first via enqueues the via

3. boxes scheduled the via duplicates data sends it to the boxes enqueues them

4. task stolen thread 2 steals one task both tasks are invoked boxes produce their results

5. boxes finished newly created tasks enqueued with the same thread

6. another task stolen one task stolen by the thread 3

Performance DFA and non-dfa (referential) scheduler Performance is affected by envelope (data) size data larger than cache performance decreases several levels of caches several drops Number of threads CPU: 4 cores w/ hyperthreading 2 to 4 threads significantly better than single-threaded best results: 7 threads

time (ms) Performance / envelope size envelope size

time (ms) Performance / number of threads number of threads

Related projects Bobox is used in several related projects SPARQL compiler model visualization (graph drawing) semantic processing query optimization data stream processing (astro-informatics) Participating universities Charles University Prague (cz) Comenius University Bratislava (sk) University of Vienna (at)

Conclusions We have demonstrated the impact of supplying dataflow information to a scheduler Advantage of Bobox architecture data-flow information is a part of the application code Data-awareness significant performance improvement DFA scheduler required appx. 30% less time

The End Thank you for your attention

Bobox runtime architecture

Performance / envelope size 2K 4K 8K 16K 32K 64K 128K 256K DFA 1686 1267 1046 947 911 977 1420 2935 ref 1927 1423 1313 1297 1751 3002 3007 4044

Bobox Box computation unit basic paradigms task parallelism, non-linear pipeline high-performance messaging communication, synchronization technical details handled by the framework NUMA, cache hierarchy, CPU architecture