e-issn: p-issn:

Similar documents
Spring 2011 Prof. Hyesoon Kim

Multi-core Programming System Overview

PERFORMANCE ENHANCEMENTS IN TreeAge Pro 2014 R1.0

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April Page 1 of 12

Multi-Threading Performance on Commodity Multi-Core Processors

Introduction to Cloud Computing

Multicore Programming with LabVIEW Technical Resource Guide

Course Development of Programming for General-Purpose Multicore Processors

CHAPTER 1 INTRODUCTION

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing

Testing Database Performance with HelperCore on Multi-Core Processors

Performance Analysis of Web based Applications on Single and Multi Core Servers

Dynamic resource management for energy saving in the cloud computing environment

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 5 - DBMS Architecture

Control 2004, University of Bath, UK, September 2004

Experimental Evaluation of Energy Savings of Virtual Machines in the Implementation of Cloud Computing

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Performance Modeling for Web based J2EE and.net Applications

Multi-core architectures. Jernej Barbic , Spring 2007 May 3, 2007

Informatica Ultra Messaging SMX Shared-Memory Transport

Improving System Scalability of OpenMP Applications Using Large Page Support

Cloud Computing and Robotics for Disaster Management

Parallel Computing with MATLAB

x64 Servers: Do you want 64 or 32 bit apps with that server?

Server Based Desktop Virtualization with Mobile Thin Clients

Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus

Driving force. What future software needs. Potential research topics

BLM 413E - Parallel Programming Lecture 3

The Fastest Way to Parallel Programming for Multicore, Clusters, Supercomputers and the Cloud.

Parallel Algorithm Engineering

Optimizing Shared Resource Contention in HPC Clusters

USING VIRTUAL MACHINE REPLICATION FOR DYNAMIC CONFIGURATION OF MULTI-TIER INTERNET SERVICES

Smart Cards a(s) Safety Critical Systems

Software and the Concurrency Revolution

Methodology for predicting the energy consumption of SPMD application on virtualized environments *

An Oracle Benchmarking Study February Oracle Insurance Insbridge Enterprise Rating: Performance Assessment

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Performance Analysis of Remote Desktop Virtualization based on Hyper-V versus Remote Desktop Services

Effective Utilization of Multicore Processor for Unified Threat Management Functions

ENERGY-EFFICIENT TASK SCHEDULING ALGORITHMS FOR CLOUD DATA CENTERS

Supercomputing applied to Parallel Network Simulation

An Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform

Program Optimization for Multi-core Architectures

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

HPC performance applications on Virtual Clusters

Oracle Primavera P6 Enterprise Project Portfolio Management Performance and Sizing Guide. An Oracle White Paper October 2010

Chapter 1 Computer System Overview

Bringing Big Data Modelling into the Hands of Domain Experts

THE EXPAND PARALLEL FILE SYSTEM A FILE SYSTEM FOR CLUSTER AND GRID COMPUTING. José Daniel García Sánchez ARCOS Group University Carlos III of Madrid

GPUs for Scientific Computing

Real-Time Monitoring Framework for Parallel Processes

Initial Hardware Estimation Guidelines. AgilePoint BPMS v5.0 SP1

Multi-core and Linux* Kernel

Rethinking SIMD Vectorization for In-Memory Databases

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

System Requirements Table of contents

Java SE 8 Programming

Setting deadlines and priorities to the tasks to improve energy efficiency in cloud computing

Using In-Memory Computing to Simplify Big Data Analytics

Chapter 6, The Operating System Machine Level

BSPCloud: A Hybrid Programming Library for Cloud Computing *

Effective Computing with SMP Linux

Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries

SWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri

Four Keys to Successful Multicore Optimization for Machine Vision. White Paper

DELL. Virtual Desktop Infrastructure Study END-TO-END COMPUTING. Dell Enterprise Solutions Engineering

Carlos Villavieja, Nacho Navarro Arati Baliga, Liviu Iftode

Performance Enhancement of Multicore Processors using Dynamic Load Balancing

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

HPC with Multicore and GPUs

Run-time Resource Management in SOA Virtualized Environments. Danilo Ardagna, Raffaela Mirandola, Marco Trubian, Li Zhang

Virtuoso and Database Scalability

Performance Evaluation and Optimization of A Custom Native Linux Threads Library

Mobile Cloud Computing for Data-Intensive Applications

In-Memory Computing for Iterative CPU-intensive Calculations in Financial Industry In-Memory Computing Summit 2015

Stream Processing on GPUs Using Distributed Multimedia Middleware

Gildart Haase School of Computer Sciences and Engineering

PERFORMANCE ANALYSIS OF PaaS CLOUD COMPUTING SYSTEM

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

JReport Server Deployment Scenarios

Reallocation and Allocation of Virtual Machines in Cloud Computing Manan D. Shah a, *, Harshad B. Prajapati b

Parallel Computing: Strategies and Implications. Dori Exterman CTO IncrediBuild.

RevoScaleR Speed and Scalability

Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures

Lecture 11: Multi-Core and GPU. Multithreading. Integration of multiple processor cores on a single chip.

Parallelism and Cloud Computing

Cloud Computing for Universities: A Prototype Suggestion and use of Cloud Computing in Academic Institutions

SEER PROBABILISTIC SCHEDULING FOR COMMODITY HARDWARE TRANSACTIONAL MEMORY. 27 th Symposium on Parallel Architectures and Algorithms

Affinity Aware VM Colocation Mechanism for Cloud

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

Contents Introduction... 5 Deployment Considerations... 9 Deployment Architectures... 11

Microsoft Robotics Studio

Virtualization and Cloud Computing. Sorav Bansal

LLamasoft K2 Enterprise 8.1 System Requirements

A Hybrid Approach To Live Migration Of Virtual Machines

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

Bogdan Vesovic Siemens Smart Grid Solutions, Minneapolis, USA

Scheduling Task Parallelism" on Multi-Socket Multicore Systems"

Making Multicore Work and Measuring its Benefits. Markus Levy, president EEMBC and Multicore Association

Transcription:

Available online at www.ijiere.com International Journal of Innovative and Emerging Research in Engineering e-issn: 2394 3343 p-issn: 2394 5494 Parallel Design Patterns for multi-core using Java Jayshree Chaudhari Student of ME (Computer Engineering), Shah and Anchor Kutchhi Engineering College, Mumbai, India. ABSTRACT: Multi-core have become a norm, but software deployed on such systems are not suited to exploit the power available at their disposal. This paper presents how the parallel programming design patterns can be adapted for performance benefits on multi-core CPUs. Java 7 provides only Fork and Join parallel design pattern, which is efficient for recursive programming. But other scenario like simple loops, dynamic load balancing, scientific calculations, and other parallel design patterns proves to be handy. I have implemented design patterns not available in Java 7, as API s and compared core utilization and time taken with respect to corresponding sequential implementation for Monte Carlo algorithm. Keywords: Parallel Design Pattern, Single Process Multiple Data, Master Worker, Loop Parallelism, Shared Queue, Distributed Array. I. INTRODUCTION Up to early 2000 s hardware manufacturers were successful in doubling clock frequency of micro-processors every 18 months in line with Moore s law. Micro-processor manufacturers realized need of multi-core processor due to increasing gap between processor and memory speed, the limit of instruction level parallelism as well as high power consumption [1, 10]. Multi-core micro-processor delivers high performance through multi-processor parallel processing. Running the sequential program of the job only gives the performance on single core, leaving the other core on the chip wasted [10]. To optimize use of multi-core processor, program needs to be re-written in parallel using parallel programming. Parallel programming is a methodology of dividing a large problem in to smaller ones and solving smaller problems concurrently [11]. The initial response to the multi-core was parallel programming using multi-threaded programming. However, writing parallel and concurrent programs in the multi-threading model is difficult. Parallel design patterns rescues programming community from intricacies of writing a parallel program and allows focusing on business logic. Until Java 5, writing a parallel program was writing multi-threaded program, which was painful as development community had to deal with thread synchronization and pitfalls of parallel programming. Java 5 and Java 6 provided java.util.concurrent packages, which introduced set of API s providing development community a powerful concurrency build blocks. Java 7 further enhanced support for parallel programming by providing Fork and Join Design Pattern API. Other powerful programming design patterns like SPMD, Master Worker, Loop Parallelism, Distributed Array and Shared Queue are still not present [2]. I have implemented this design patterns as API and compared time taken and core utilization compared with corresponding sequential implementation for Monte Carlo algorithm. II. PARALLEL DESIGN PATTERN Design patterns are a recurring solution to a standard problem. Programming design patterns are modeled on successful past programming experiences. These patterns once modeled, can then be reused in other programs. Typically, hand coding a program from scratch i.e. multi-threaded results in better execution time performance, but may consume immense amounts of time and errors [9]. Design patterns for parallel programming makes life of programmer easy by addressing to recurring problems like concurrent access to data, communicating between parallel programs, synchronization, dead locks etc. A parallel design pattern includes Single Program Multiple Data (SPMD), Master-Worker, Loop Parallelism, Shared Queue and Distributed Array. A. SPMD & Distributed Array In SPMD Design pattern, all Unit of Executions (UEs) execute a Single Program in parallel each having its own set of data and unique identifier. SPMD pattern is mostly used in scientific applications having complex compute intensive algorithms, which needs to be executed in parallel without changing the program. The data might be different between UEs, or slightly different computations might be needed on a subset of UEs, but for the most part each UE will carry out 198

similar computations. Distributed Array decomposes the data either vertically or horizontally into number of sets [1, 3]. Figure 1 shows high level diagram for implementation of SPMD along with Distributed Array. Master process decomposes and distributes data across predefined number of UEs (4 UE s considered in figure 1) and assigns unique identifier for each UE to differentiate behavior. Data is decomposed among four UE s using Distributed Array pattern. The results of UEs are combined by the master process. Figure 1: Single Program Multiple Data & Distributed Array Pattern [1] B. Master Worker & Shared Queue A Single master process sets up a pool of worker processes and a bag of tasks. Workers execute concurrently by repeatedly removing a task from bag of tasks and processing it. Shared Queue shares data efficiently between concurrently executing UEs. Master Worker design pattern dynamically balance the work on a set of tasks among the Unit of Executions (UEs) [1]. Figure 2 shows high level diagram for Master Worker and Shared Queue. Master process will create a bag of task using Shared Queue and will create predefined number of UEs (4 UE s are shown in figure 2) which will act as workers [6]. This UEs/Workers removes task repeatedly from Shared Queue one by one concurrently and process it till the queue is empty or termination condition is reached. Master will then combine the results. Figure 1 : Master Worker & Shared Queue [1] 199

C. Loop Parallelism Loop Parallelism transforms compute intensive loops of sequential program into a parallel program, where-in different iterations of the loop are executed in parallel. In most of the languages, loops are executed sequentially. If numbers of iterations are high and compute intensive, then time taken is higher and core utilization is poor. In such cases it makes sense to execute iterations in parallel threads on multiple UEs [1, 5]. Figure 3 shows the high level diagram of Loop Parallelism, here the iteration of for loop are divided equally among three UEs. Numbers of UEs are predefined before the execution starts. Figure 3: Loop Parallelism Design [1] III. RELATED WORK In the book Pattern of Parallel Programming by Timothy G. Mattson and Beverly A. Sanders and Berna L. Massingill introduced a complete, highly accessible pattern language using design spaces that helps any experienced developer "think parallel" and write effective parallel code. Design spaces provides development community medium to write complex parallel program using parallel design patterns. Supporting structure design space defines patterns like Master Worker, SPMD, Loop Parallelism, Distributed Array and Shared Queue/Data [1]. Microsoft.NET framework 4 included Parallel Pattern Library (PPL), Task Parallel Library (TPL) and Parallel Language-Integrated Query (PLINQ), which are commonly used with C++ and C#. On top of these libraries.net 4 frameworks added design patterns like Parallel Loop, Fork and Join, Shared Data and other patterns like Map Reduce and Producer Consumer [7]. Java added Fork and Join design pattern build on top of executor framework in its version 7[8]. IV. METHODOLOGY Water fall development model is used to develop API in Java for design patterns like Master Worker, SPMD and Loop Parallelism [4]. Development is done on top of existing Java concurrency API and using eclipse development environment. For testing API test classes are created along with corresponding sequential implementation. CPU Core utilization is monitored using Window task manager. Following environment was considered for testing the patterns: Processor Details : Intel Core i5-4310u CPU @ 2.00GHz No of Cores : 4 RAM : 4 GB OS : Windows 7 Service pack 1 64-bit JVM: JDK 7 CPU Utilization: Windows 7 Resource monitor Algorithm: Monte Carlo This test can also be carried out on other operating system which can host Java Virtual Machine (JVM) and having multi-core CPU. 200

V. OBSERVATION OF DIFFERENT DESIGN PATTERN Number of UE s TABLE 1: TIME TAKEN BY DESIGN PATTERNS Average Time taken in seconds Fork and SPMD & Master Loop Parallelism Join Distributed Worker & Array Shared Queue Sequential 20.122 22.241 21.540 22.307 2 10.414 9.719 9.836 10.371 3 9.353 9.001 9.088 9.254 4 8.836 8.814 8.884 9.046 5 9.146 9.095 9.291 9.228 6 9.45 9.110 9.6 9.552 Figure 4: Core Utilization Sequential Figure 6: Core Utilization SPMD (4 UE s) Figure 5: Core Utilization Fork & Join (4 UE s) Figure 7: Core Utilization Master Worker (4 UE s) 201

Figure 8: Core Utilization Loop Parallelism (4UE s) Following are observation based on test results: Sequential program mostly uses one core of CPU as shown in figure 4. With increase in number of UE s cores utilization improves and time taken reduces as observed in table 1. Approx. 60% reduction time taken by design patterns compared with corresponding sequential implementation as shown in table 1. Time taken by design patterns reduces drastically up to 4 UE s and there after start increasing slowly. Master Worker & Shared Queue, SPMD & Distributed Array and Loop Parallelism API developed are having comparable time reduction as observed in table 1 and core utilization compared with existing Fork and Join on same machine as shown in figures 5, 6, 7 and 8. Time taken for execution differs each time program is executed for both sequential and Patterns API developed. VI. CONCLUSIONS Sequential Programs are not able to take advantage of advancement in hardware architecture of multi-core. To take advantage of multiple cores, programs need to be written in parallel by writing multi-threaded program. Writing a multi-threaded program bring with it its own challenges like concurrent access to data, communicating between parallel programs, synchronization, dead locks etc. Design pattern addresses challenges associated with writing multithreaded program. The observations shows that all the API are getting approximate 60% reduction in time taken and 100% cores utilization compared with equivalent sequential program for Monte-Carlo algorithm. Increase in number of processing units the core utilization betters till number of UEs are equivalent to number of cores, after that the time taking increases as there is overhead for core swapping between UEs. In Future these patterns can be extended to other programming languages. This pattern can also be used as distributed parallel programming. REFERENCES [1] Timothy G. Mattson and Beverly A. Sanders and Berna L. Massingill, Patterns for Parallel Programming,ISBN: 0-321-22811-1. [2] Cristian Mateos, Alejandro Zunino, Matías Hirsch, 2012. Parallelism as a concern in Java through fork-join synchronization patterns. Computational Science and Its Applications (ICCSA), 2012 12th International Conference on 18-21 June 2012,pp 49-50. [3] Ronal Muresano, Dolores Rexachsy, Emilio Luquez. Methodology for Efficient Execution of SPMD Applications on Multicore Environments.Cluster, Cloud and Grid Computing (CCGrid), 2010 10th IEEE/ACM International Conference on 17-20 May 2010. [4] Kurt Keutzer and Tim Mattson, A design Pattern Language for Engineering (Parallel) Software The Berkely Par Lab: Progress in the Parallel Computing Landscape, Intel technology Journal 13, pg 213,219-221, 2010. [5] Meikang Qiu, Jian-Wei Niu, Laurence T. Yang, Xiao Qin, Senlin Zhang, Bin Wang, 2010. Energy-Aware Loop Parallelism Maximization for Multi-Core DSP Architectures. Green Computing and Communications (GreenCom), 2010 IEEE/ACM Int'l Conference on & Int'l Conference on Cyber, Physical and Social Computing (CPSCom) 18-20 Dec. 2010. 202

[6] R. Rugina and M.C. Rinard. Automatic parallelization of divide and conquer algorithms.in PPoPP 99, pages 72-83. [7] Stephen Toub. Patterns for Parallel Programming: Understanding and Applying Parallel Patterns with the.net Framework 4, Parallel Computing Platform Microsoft Corportion, July 2010. [8] The Java Tutorials, http://docs.oracle.com/javase/tutorial. [9] Narjit Chadha. A Java Implemented Design-Pattern-Based System for Parallel Programming, htttp://mspace.lib.umanitoba.ca/bitstream/1993/19835/1/chadha_a_java.pdf,october 2002,pp 3. [10] Peiyi Tang. Multi-core Parallel Programming in Go. In Proceedings of the First International Conference on Advanced Computing and Communications, (ACC'10), September 2010,pp 64. [11] Almasi, G. S. and A. Gottlieb. 1989. Highly Parallel Computing. Benjamin-Cummings Publ. Co., Inc., ACM, Redwood City, CA, USA,pp 431. 203