Parallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage



Similar documents
Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

CSE-E5430 Scalable Cloud Computing Lecture 2

Scalability and Classifications

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Introduction to Hadoop

BIG DATA, MAPREDUCE & HADOOP

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Energy Efficient MapReduce

Big Data and Apache Hadoop s MapReduce

COMP 422, Lecture 3: Physical Organization & Communication Costs in Parallel Machines (Sections 2.4 & 2.5 of textbook)

Architectures for Big Data Analytics A database perspective

Big Data With Hadoop

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Big Fast Data Hadoop acceleration with Flash. June 2013

Introduction to Cloud Computing

BIG DATA USING HADOOP

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

What is Analytic Infrastructure and Why Should You Care?

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

MapReduce and Hadoop Distributed File System

Hadoop Architecture. Part 1

Hadoop and Map-Reduce. Swati Gore

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

THE HADOOP DISTRIBUTED FILE SYSTEM

Chapter 2 Parallel Architecture, Software And Performance

Binary search tree with SIMD bandwidth optimization using SSE

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Introduction to Hadoop

HPC Programming Framework Research Team

Petascale Software Challenges. William Gropp

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Keywords: Big Data, HDFS, Map Reduce, Hadoop

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

International Journal of Advance Research in Computer Science and Management Studies

Exploring the Efficiency of Big Data Processing with Hadoop MapReduce

HDFS Space Consolidation

Snapshots in Hadoop Distributed File System

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

Parallel Programming Survey

Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy?

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

GraySort and MinuteSort at Yahoo on Hadoop 0.23

Fault Tolerance in Hadoop for Work Migration

Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

A Performance Analysis of Distributed Indexing using Terrier

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

Seeking Opportunities for Hardware Acceleration in Big Data Analytics

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

GPU File System Encryption Kartik Kulkarni and Eugene Linkov

Xeon+FPGA Platform for the Data Center

Survey on Load Rebalancing for Distributed File System in Cloud

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE

Hadoop IST 734 SS CHUNG

Hybrid Software Architectures for Big

Accelerating and Simplifying Apache

MapReduce and Hadoop Distributed File System V I J A Y R A O

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

GraySort on Apache Spark by Databricks

MAPREDUCE Programming Model

Hadoop Cluster Applications

Improving MapReduce Performance in Heterogeneous Environments

Reducer Load Balancing and Lazy Initialization in Map Reduce Environment S.Mohanapriya, P.Natesan

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

BSC vision on Big Data and extreme scale computing

Cloud Computing at Google. Architecture

MapReduce (in the cloud)

MapReduce: Simplified Data Processing on Large Clusters. Jeff Dean, Sanjay Ghemawat Google, Inc.

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

MPI and Hybrid Programming Models. William Gropp

The Hadoop Framework


Unified Big Data Analytics Pipeline. 连 城

Hadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Hadoop & its Usage at Facebook

Optimization and analysis of large scale data sorting algorithm based on Hadoop

Large scale processing using Hadoop. Ján Vaňo

Hadoop Parallel Data Processing

Hadoop-BAM and SeqPig

Performance Analysis and Optimization Tool

NoSQL Data Base Basics

Apache Hadoop. Alexandru Costan

Chapter 18: Database System Architectures. Centralized Systems

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

Hadoop. Sunday, November 25, 12

Scaling Out With Apache Spark. DTL Meeting Slides based on

marlabs driving digital agility WHITEPAPER Big Data and Hadoop

Transcription:

Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014

Hadoop, Review

Hadoop Hadoop History Hadoop Framework Fault Tolerance Example Applications

Hadoop History Motivated by Dean and Ghemawat MapReduce: Simplified Data Processing on Large Clusters Proc. OSDI 2004 also Communications of the ACM, 51(1), 107-113 (2008) Paper describes Googles original implementation in C++ Based on this, Hadoop Developed by Doug Cutting (at Yahoo) and Mike Cafarella (graduate student at University of Washington now faculty at University of Michigan) Hadoop originally used to support Nutch search engine project Open source because needed a large number of developers Currently an Apache Foundation supported project Used by many companies for data analysis Hadoop uses Java as underlying language (efficiency concerns compared to C++, but wider number of possible users?)

Hadoop Framework Primarily aimed at data intensive computing Large scale databases on commodity hardware Slow network, typically ethernet, but reasonable fast commodity processors Distributed file system (Hadoop distributed file system - HDFS) Hadoop 2 is current release and has some backwards compatibility with Hadoop 1 Main feature is a map phase where local work is done without any communication, and then a reduce phase where communication is done. Main idea is that data is stored on local disks, so not required to move it over a slow network, only move results of query in map phase. For many problems, query results have much less data, so can give interactive response for large data sets over a low bandwidth high latency network.

Fault Tolerance Data is typically replicated thrice On primary disk, data is replicated twice to minimize access times. Third replication is done on a separate disk. System has a task scheduler (in Hadoop 1, this is not replicated, but in Hadoop 2, replicated to prevent system crashes) and then work coordinators which report back to task scheduler Should a part of the system fail, work is rescheduled There is also a lookup table or set of lookup tables Scheduling policy needs to allocate work to enable all jobs in an organization to be run efficiently. This is a tough problem to solve exactly, heuristics typically used.

Example Applications Search engines Examining computer internet access logs Online store purchase recommendations Credit reports Airline flight scheduling Consumer spending pattern analysis Machine translation Machine learning

Ways to Get Involved Part of an open source ecosystem Many developers from around the world Many extensions being proposed using the same model Targeted towards commodity hardware Some good redundancy ideas that may be useful for future exascale compute models

Review

Supercomputer Rankings Top 500 Green 500 Graph 500 Green Graph 500

Parallel Programming APIs OpenMP MPI OpenCL CAF UPC Many others, but the ones listed are mainstream at the moment

Performance measurement How long does your code take to run How well does your code utilize the hardware it is running on Amdhal s law Gustafsson s law

Roofline Model Upper bound for performance of your code or important kernels in your code Determine if algorithm is RAM bandwidth or compute bound at a node level Can extend ideas to system level by using interprocessor bandwidth rather than bandwidth from RAM

Measures of Efficiency Speedup Parallel Efficiency

Computer Chip Architecture Memory hierarchy: Registers, L1 Cache, L2 Cache, L3 Cache, RAM, disk Floating point units, integer units, instruction units, vector units, Fused multiply add, prefetching, cores, NUMA Clock speed

Computer Interconnect Toplogies Bus Ring Mesh All connected to All Hypercube 2D, 3D, 4D, 5D, 6D torii Fat Tree CLOS Dragonfly

Algorithm Operation Counts Dot-Product Matrix multiplication

Algorithm Runtime Performance Estimation Data processing: Functional operation count Operations per cycle cycles per second Data movement: Theory, Reduction, All-to-All on different simple network topologies Data movement: In practice, empirical models needed due to complexity of current networks Data movement: Theory, Switch

Single Core Optimization In many cases where one is doing parallelization to speed up time to solution, check whether re-optimizing your code will help Many people who re-program for Xeon Phi, make their codes faster for Xeon chip as well since they make it easier for the chip to do in order executions

Accelerators Good for floating point computations High energy efficiency Typically very many small processing units which do only a few things very efficiently (e.g. floating point operations) Very simple instruction scheduling Example: Xeon Phi, GPUs (Graphics Processing Unit), MPPA (Massively Parallel Processor Array), FPGA (Field Programmable Gate Array)

Vectorization Lowest level of parallelization Single instruction multiple data Heavily used in efficient floating point architectures such as Xeon Phi Requires regular memory accesses

Loop Parallelization Can be coarse grain or fine grain For many tasks, fine grain needs to be carefully implemented to avoid excessive overhead costs Usually easiest form of parallelizm to add to a code

Task Parallelization Often used in information processing applications Need to schedule appropriately to ensure available resources are used Need to ensure all data is available

DAG scheduling Method to expose parallelizm by determining task dependencies Heavily used in information processing applications Being applied to massively concurrent architectures in high performance computing Need to be careful in implementation to avoid high memory costs from generating the graph Just in time dependency generation is one way of reducing memory requirements

Load Balancing Need to be careful in how partition work Want all processes to be kept busy during parallel execution Static load balancing can be done easily at startup time if the program allows for this Dynamic load balancing more difficult, typically arises in adaptive solution of models for physical processes

Sorting Many ways to do this in parallel, examples include quick sort, merge sort Typically does not require many floating point operations, instead doing comparisons Used in bioinformatics, particle based algorithms, etc.

I/O Usually slowest operation File systems are complicated For many scientific computing applications, try to minimize this For many data applications, this is the most important part of the problem

Hadoop and Mapreduce Parallel programming model, primarily aimed for data processing Few commands which allow easy buildup of parallel programs

Other Topics Pipelining Vector processors (NEC SX-ACE) Non-traditional uses of supercomputing: text analysis, economic planning

References http://hadoop.apache.org/docs/current/ https://en.wikipedia.org/wiki/apache_hadoop Herodotou, H. Automatic Tuning of Data-Intensive Analytical Workloads PhD Thesis, Duke University (2012), http://www.cs.duke.edu/ hero/research.html Dean, J. and Gehmawat, S. MapReduce: Simplified Data Processing on Large Clusters Proc. OSDI 04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, USA, December, (2004) also Communications of the ACM, 51(1), 107-113 (2008)

References http://www.drdobbs.com/database/ hadoop-tutorial-series/240155055 Wadkar, S., Siddalingaiah, M. and Venner, J. Pro Apache Hadoop 2nd Ed. Apress (2014) Shenoy, A. Hadoop Explained Packt Publishing (2014) Nielsen, L. Hadoop for Laymen Newstreet Communications (2014) Lockwood, G.L. Tutorials in Data-Intensive Computing http://www.glennklockwood.com/di/index.php (2014)

Myroslava Stavnycha Acknowledgements