# Martlet: A Scientific Work-Flow Language for Abstracted Parallisation

Save this PDF as:

Size: px
Start display at page:

## Transcription

4 // Declare URI abbreviations in order to improve the script readability define uri1 = basefunction:system:http://cpdn.net:8080/martlet; proc(a,b) // Declare the required local variables for the computation. Y and Z // are used to represent the two sets of values Yi and Zi in the // example equations. ZTotal will hold the sum of all the Zi s. Y = new dismatrix(a); Z = new disinteger(a); ZTotal = new integer(b); // The base case where each Yi and Zi is calculated, and recorded in // Y and Z respectively. The map construct results in each Zi and Yi // being calculated independently and in parallel. map matrixsum:uri1(a,y); matrixcardinality:uri1(a,z); // The inductive case, where we sum together the distributed Yi s // and Zi s into B and ZTotal respectively. tree((yl,yr)\y -> B, (ZL,ZR)\Z -> ZTotal) matrixsumtovector:uri1(yl,yr,b); IntegerSum:uri1(ZL,ZR,ZTotal); // Finally we divide through B with ZTotal to finish computing the // average of A storing the result in B. matrixdivide:uri1(b,ztotal,b); Figure 1: Function for computing the average of a matrix A split across an unknown number of servers. The syntax and semantics of this function is explained in Section 5.

5 of statements were run concurrently by an asynchronous composition. An example of this is shown in Figure 2 Asynchronous Composition is marked by the keyword async and encompasses a set of statements. When this is executed each statement in the set is started concurrently. The asynchronous statement only terminates when all the sub-statements have returned. In order to prevent race conditions it is necessary that no process uses a variable concurrently with a process that writes to the variable. This is enforced by the middleware at runtime. if-else & while are represented and behave the same as they would in any other procedural language. There is a test and then a list of statements. can be created by state- Temporary Variables ments that look like identifier = new type(identifier); The identifier on the left hand side of the equality is the name of the new variable. The type on the right is the type of the variable, and the identifier on the right is a currently existing data structure used to determine the level of parallelisation required for the new variable. For example if the statement was A = new DisMatrix(B); this will create a distributed matrix A that is split into the same number of pieces as B. The type field is required as there is no constraint that the type of A is the same as the type of B. This freedom is required as there is no guarantee that a distributed data structure of the right type is going to appear at this stage in the procedure, as was the case in the average calculation example in Figure 1. Process calls fall into one of two categories. They can either be statically named in the function or passed in as a reference at runtime. Both appear as an identifier and a list of arguments. 5.2 Expandable Statements The are four expandable statements, map, foldr, foldl and tree. Each of these has a functional programming equivalent. Expandable statements don t propagate the call to expand to their children and must have been expanded before the function can be computed. This means that on any given path between the root and a leaf there must be at most one expandable statement. async function1(a,b,c); function2(a,b); function3(b,c); function4(d,e); function1(d,e,f); function5(e,f); Figure 2: used to run two uential sets of operations asynchronously. map is equivalent to map in functional programming where it takes a function f and a list, and applies this function to every element in the list. This is shown below in Haskell: map f [] = [] map f (x:xs) = (f x):(map f xs) Map in Martlet encompasses a list of statements as shown in the example below. Here function calls f1 and f2 are implicitly joined in a uential composition to create the function represented by f in the Haskell definition. The list is created by distributed values A and B. While in its unexpanded abstract form, this example maps onto the abstract syntax tree also shown below. map f1(a); f2(a,b); map f 1 (A) f 2 (A, B) When this is expanded, it looks at the distributed data structures it has been passed and creates a copy of these statements to run independently on each piece of the distributed data structure as shown in Figure 3. Due to the use of an asynchronous statement in this transformation, no local value that is passed into the map statement can be written to. However local values created within the map node can be written to.

6 f 1 (A 1 ) f 1 (A 3, C) f 2 (A 1, B 1 ) f 2 (B 3, C) f 1 (A 2 ) f 1 (A 2, C) async f 2 (A 2, B 2 ) f 2 (B 2, C) f 1 (A 3 ) f 1 (A 1, C) f 2 (A 3, B 3 ) Figure 3: The abstract syntax tree for the example map statement after expand has been called setting A = [A 1, A 2, A 3 ] and B = [B 1, B 2, B 3 ]. f 2 (B 1, C) Figure 4: The abstract syntax tree for the example foldr statement after expand has been called setting A = [A 1, A 2, A 3 ] and B = [B 1, B 2, B 3 ]. foldr is a way of applying a function and an accumulator value to each element of a list. This is defined in Haskell as: foldr f e [] = e foldr f e (x: xs) = f x (foldr f e xs) This means that the elements of a list xs = [1,2,3,4,5] can be summed by the statement; foldr (+) 0 xs which evaluates to 1+(2+(3+(4+(5+0)))) Foldr statements are constructed from the foldr keyword followed by a list of one or more statements which represent f. An example is shown below with its corresponding abstract syntax tree. foldr f1(a,c); f2(b,c); foldr f 1 (A, C) f 2 (B, C) When this function is expanded this is replaced by a uential statement that keeps any non-distributed arguments constant and calls f repeatedly on each piece of the distributed arguments as shown in Figure 4. foldl is the mirror image of foldr so the Haskell example would now evaluate to ((((0+1)+2)+3)+4)+5 The syntax tree in Martlet is expanded in almost exactly the same way as foldr. The only difference is the function calls from the uential statement are in reverse order. The only time that there is any reason to choose between foldl and foldr is when f is not commutative. tree is a more complex statement type. It constructs a binary tree with a part of the distributed data structure at each leaf, and the function f at each node. When executed this is able to take advantage of the potential for parallel computation. A Haskell equivalent is: tree f [x] = x tree f (x:y:ys) = f (tree f xs ) (tree f ys ) where (xs,ys ) = split (x:y:ys) split is not defined here since the shape of the tree is not part of the specification. It will however always split the list so that neither is empty. Unlike the other expandable statements, each node in a tree takes 2n inputs from n distributed data structures, and produce n outputs. As there is insufficient information in the structure to construct the mappings of values between nodes within the tree, the syntax requires the arguments that the statements use to be declared in brackets above the function in

7 such a way that the additional information is provided. Non-distributed constants and processes used in f are simply denoted as a variable name. The relationship between distributed inputs and the outputs of f are encoded as (ALeft,ARight)\A->B, where ALeft and ARight are two arguments drawn from the distributed input A that f will use as input. The output will then be placed in B and can be used as an input from A at the next level in the tree. Lets consider a function that uses a method sum passed into the statement as s, a distributed argument X as input and outputs the result to the nondistributed argument A. This could be written as: tree((xl,xr)\x -> A) s(a,xl,xr); tree s(a, XL, XR) When this is expanded, it uses uential, asynchronous and temporary variables in order to construct the tree as shown in Figure 5. Because of the use of asynchronous statements any value that is written to must be passed in as either an input or an output. 5.3 Example If the Martlet program to calculate averages from the example in Figure 1 where submitted it would produce the abstract syntax tree shown in Figure 6. This could then be expanded using the techniques show here to produce a concrete functions for different concrete datasets. 6 Conclusions In this paper we have introduced a language and programming model that use functional constructs and two classes of data structure. Using these constructs it is able to abstract from users the complexity of creating parallel processes over distributed data and computing resources. This allows the user simply to think about the functions they want to perform and does not require them to worry about the implementation details. Using this language, it has been possible to describe a wide range of algorithms, including algorithms for performing Singular Value Decomposition, North Atlantic Oscillation and Least Squares. To allow the evaluation of this language and programming model, a supporting middleware has been constructed [10] using web services supported by Apache Axis [3] and Jakarta Tomcat [4]. As we have found no projects with a similar approach aimed at a similar style of environment, a direct comparison with other projects has not been possible. This work is, however, currently being tested with data from the ClimatePrediction.net project with favorable results and will hopefully be deployed on all our servers over the course of the next year allowing testing on a huge data set. At runtime, when concrete values have been provided, it is possible to convert abstract functions into concrete functions. The concrete functions then contain no operations that are not supported by a range of other languages. As such, it is envisaged that the middleware will in time be cut back to a layer that can sit on top of existing work-flow engines, providing extended functionality to a wide range of distributed computing applications. This capability will be provided through the construction of a set of JIT compilers for different work-flow languages. Such compliers need only take a standard XML output produced at runtime and performing a transformation to produce the language of choice. This would then allow a layer supporting the construction of distributed data structures and the submission of abstract functions to be placed on top of a wide range of existing resources with minimal effort, extending their use without affecting their existing functionality. Such a middleware would dramatically increase the number of projects that Martlet is applicable to. Hopefully the ideas in Martlet will then be absorbed into the next generation of work-flow languages. This will allow both existing and future languages to deal with a type of problem that thus far has not been addressed, but will become ever more common as we generate ever-larger data sets. References [1] David P. Anderson, Jeff Cobb, Eric Korpela, Matt Lebofsky, and Dan Werthimer. an experiment in public-resource computing. Commun. ACM, 45(11):56 61, [2] Tony Andrews, Francisco Curbera, Hitesh Doholakia, Yaron Goland, Johannes Kiein, Frank Leymann, Kevin Liu, Dieter Roller, Doug Smitth, Satish Thatte, Ivana Trickovic, and Sanjiva Weerwarana. BPEL4WS. Technical report, BEA Systems, IBM, Microsoft, SAP AG and Siebel Systems, [3] Apache Software Foundation. Apache Axis, URL:

8 s(a 2, X 1, X 2 ) async create A 2, A 3 s(a 3, X 3, X 4 ) create A 1 s(a 1, A 2, A 3 ) s(a, A 1, X 5 ) Figure 5: When the tree function on page 7 is expanded with X = [X 1, X 2, X 3, X 4, X 5 ], this is one of the possible trees that could be generated. MatrixSum(A,Y) map MatrixCardinality(A,Z) create Y, Z, ZTotal MatrixSumToVector(YL,YR,B) tree IntegerSum(ZL,ZR,ZTotal) MatrixDivide(B,ZTotal,B) Figure 6: The abstract syntax tree representing the generic work-flow to compute the an average introduced in Figure 1. [4] Apache Software Foundation. The Apache Jakarta Project, URL: [5] Richard Bird. Introduction to Functional Programming using Haskell. Prentice Hall, second edition, [6] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. Technical report, Google Inc, December [7] E. Deelman, J. Blythe, Y. Gil, and C. Kesselman. Pegasus: Planning for execution in grids. Technical report, Information Sciences Institute, [8] Sanjay Ghemawat, Howard Gobioff, and Shun- Tak Leung. The google file system. In SOSP 03: Proceedings of the nineteenth ACM symposium on Operating systems principles, pages 29 43, New York, NY, USA, ACM Press. [9] Daniel Goodman and Andrew Martin. Grid style web services for climateprediction.net. In Steven Newhouse and Savas Parastatidis, editors, GGF workshop on building Service- Based Grids. Global Grid Forum, [10] Daniel Goodman and Andrew Martin. Scientific middleware for abstracted parallelisation. Technical Report RR-05-07, Oxford University Computing Lab, November [11] Tom Oinn, Matthew Addis, Justin Ferris, Darren Marvin, Martin Senger, Mark Greenwood, Tim Carver, Kevin Glover, Matthew R. Pocock, Anil Wipat, and Peter Li. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 20(17): , [12] David Stainforth, Jamie Kettleborough, Andrew Martin, Andrew Simpson, Richard Gillis, Ali Akkas, Richard Gault, Mat Collins, David Gavaghan, and Myles Allen. Climateprediction.net: Design principles for public-resource modeling research. In 14th IASTED International Conference Parallel and Distributed Computing and Systems, Nov 2002.

### What is Analytic Infrastructure and Why Should You Care?

What is Analytic Infrastructure and Why Should You Care? Robert L Grossman University of Illinois at Chicago and Open Data Group grossman@uic.edu ABSTRACT We define analytic infrastructure to be the services,

### Six Strategies for Building High Performance SOA Applications

Six Strategies for Building High Performance SOA Applications Uwe Breitenbücher, Oliver Kopp, Frank Leymann, Michael Reiter, Dieter Roller, and Tobias Unger University of Stuttgart, Institute of Architecture

### A Study on Data Analysis Process Management System in MapReduce using BPM

A Study on Data Analysis Process Management System in MapReduce using BPM Yoon-Sik Yoo 1, Jaehak Yu 1, Hyo-Chan Bang 1, Cheong Hee Park 1 Electronics and Telecommunications Research Institute, 138 Gajeongno,

Introduction to Hadoop 1 What is Hadoop? the big data revolution extracting value from data cloud computing 2 Understanding MapReduce the word count problem more examples MCS 572 Lecture 24 Introduction

### UNIVERSITY OF LONDON (University College London) M.Sc. DEGREE 1998 COMPUTER SCIENCE D16: FUNCTIONAL PROGRAMMING. Answer THREE Questions.

UNIVERSITY OF LONDON (University College London) M.Sc. DEGREE 1998 COMPUTER SCIENCE D16: FUNCTIONAL PROGRAMMING Answer THREE Questions. The Use of Electronic Calculators: is NOT Permitted. -1- Answer Question

1 What is Hadoop? Introduction to Hadoop We are living in an era where large volumes of data are available and the problem is to extract meaning from the data avalanche. The goal of the software tools

### irods and Metadata survey Version 0.1 Date March Abhijeet Kodgire akodgire@indiana.edu 25th

irods and Metadata survey Version 0.1 Date 25th March Purpose Survey of Status Complete Author Abhijeet Kodgire akodgire@indiana.edu Table of Contents 1 Abstract... 3 2 Categories and Subject Descriptors...

### A Study on Workload Imbalance Issues in Data Intensive Distributed Computing

A Study on Workload Imbalance Issues in Data Intensive Distributed Computing Sven Groot 1, Kazuo Goda 1, and Masaru Kitsuregawa 1 University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan Abstract.

### Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

### 16.1 MAPREDUCE. For personal use only, not for distribution. 333

For personal use only, not for distribution. 333 16.1 MAPREDUCE Initially designed by the Google labs and used internally by Google, the MAPREDUCE distributed programming model is now promoted by several

### Introduction to Service Oriented Architectures (SOA)

Introduction to Service Oriented Architectures (SOA) Responsible Institutions: ETHZ (Concept) ETHZ (Overall) ETHZ (Revision) http://www.eu-orchestra.org - Version from: 26.10.2007 1 Content 1. Introduction

### Decomposition into Parts. Software Engineering, Lecture 4. Data and Function Cohesion. Allocation of Functions and Data. Component Interfaces

Software Engineering, Lecture 4 Decomposition into suitable parts Cross cutting concerns Design patterns I will also give an example scenario that you are supposed to analyse and make synthesis from The

### Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

### MAPREDUCE Programming Model

CS 2510 COMPUTER OPERATING SYSTEMS Cloud Computing MAPREDUCE Dr. Taieb Znati Computer Science Department University of Pittsburgh MAPREDUCE Programming Model Scaling Data Intensive Application MapReduce

### Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework

Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework Vidya Dhondiba Jadhav, Harshada Jayant Nazirkar, Sneha Manik Idekar Dept. of Information Technology, JSPM s BSIOTR (W),

### The Service Revolution software engineering without programming languages

The Service Revolution software engineering without programming languages Gustavo Alonso Institute for Pervasive Computing Department of Computer Science Swiss Federal Institute of Technology (ETH Zurich)

### Introduction to Parallel Programming and MapReduce

Introduction to Parallel Programming and MapReduce Audience and Pre-Requisites This tutorial covers the basics of parallel programming and the MapReduce programming model. The pre-requisites are significant

### Jeffrey D. Ullman slides. MapReduce for data intensive computing

Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very

### Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview

### Integrating NLTK with the Hadoop Map Reduce Framework 433-460 Human Language Technology Project

Integrating NLTK with the Hadoop Map Reduce Framework 433-460 Human Language Technology Project Paul Bone pbone@csse.unimelb.edu.au June 2008 Contents 1 Introduction 1 2 Method 2 2.1 Hadoop and Python.........................

### R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5

Distributed data processing in heterogeneous cloud environments R.K.Uskenbayeva 1, А.А. Kuandykov 2, Zh.B.Kalpeyeva 3, D.K.Kozhamzharova 4, N.K.Mukhazhanov 5 1 uskenbaevar@gmail.com, 2 abu.kuandykov@gmail.com,

### Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Mahesh Maurya a, Sunita Mahajan b * a Research Scholar, JJT University, MPSTME, Mumbai, India,maheshkmaurya@yahoo.co.in

### A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)

### A standards-based approach to application integration

A standards-based approach to application integration An introduction to IBM s WebSphere ESB product Jim MacNair Senior Consulting IT Specialist Macnair@us.ibm.com Copyright IBM Corporation 2005. All rights

### BPM Architecture Design Based on Cloud Computing

Intelligent Information Management, 2010, 2, 329-333 doi:10.4236/iim.2010.25039 Published Online May 2010 (http://www.scirp.org/journal/iim) BPM Architecture Design Based on Cloud Computing Abstract Zhenyu

### Report for the seminar Algorithms for Database Systems F1: A Distributed SQL Database That Scales

Report for the seminar Algorithms for Database Systems F1: A Distributed SQL Database That Scales Bogdan Aurel Vancea May 2014 1 Introduction F1 [1] is a distributed relational database developed by Google

### MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example

MapReduce MapReduce and SQL Injections CS 3200 Final Lecture Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI'04: Sixth Symposium on Operating System Design

### Challenges and Opportunities for formal specifications in Service Oriented Architectures

ACSD ATPN Xi an China June 2008 Challenges and Opportunities for formal specifications in Service Oriented Architectures Gustavo Alonso Systems Group Department of Computer Science Swiss Federal Institute

With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

### A Study of Data Management Technology for Handling Big Data

Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 9, September 2014,

### Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

Objectives Distributed Databases and Client/Server Architecture IT354 @ Peter Lo 2005 1 Understand the advantages and disadvantages of distributed databases Know the design issues involved in distributed

### http://www.paper.edu.cn

5 10 15 20 25 30 35 A platform for massive railway information data storage # SHAN Xu 1, WANG Genying 1, LIU Lin 2** (1. Key Laboratory of Communication and Information Systems, Beijing Municipal Commission

### Data-Aware Service Choreographies through Transparent Data Exchange

Institute of Architecture of Application Systems Data-Aware Service Choreographies through Transparent Data Exchange Michael Hahn, Dimka Karastoyanova, and Frank Leymann Institute of Architecture of Application

### Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE

### Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2

Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data

### Combining Services and Semantics on the Web

Combining Services and Semantics on the Web Katia Sycara, Massimo Paolucci and Naveen Srinivasan Software Agents Lab Carnegie Mellon University Pittsburgh, PA Mark Burstein Human-Centered Systems Group

### Establishing a Standard Business Process Execution Architecture for Integrating Web Services

Establishing a Standard Business Process Execution Architecture for Integrating Web Services Thilina Gunasinghe Virtusa (Pvt) Ltd. 69, Alexandra Place Colombo 7, Sri Lanka tgunasinghe@virusa.com Tim Kelly

### Workflow Partitioning and Deployment on the Cloud using Orchestra

Workflow Partitioning and Deployment on the Cloud using Orchestra Ward Jaradat, Alan Dearle, and Adam Barker School of Computer Science, University of St Andrews, North Haugh, St Andrews, Fife, KY16 9SX,

### Analysing Large Web Log Files in a Hadoop Distributed Cluster Environment

Analysing Large Files in a Hadoop Distributed Cluster Environment S Saravanan, B Uma Maheswari Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham,

### Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

### From GWS to MapReduce: Google s Cloud Technology in the Early Days

Large-Scale Distributed Systems From GWS to MapReduce: Google s Cloud Technology in the Early Days Part II: MapReduce in a Datacenter COMP6511A Spring 2014 HKUST Lin Gu lingu@ieee.org MapReduce/Hadoop

### SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA

SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA J.RAVI RAJESH PG Scholar Rajalakshmi engineering college Thandalam, Chennai. ravirajesh.j.2013.mecse@rajalakshmi.edu.in Mrs.

### Chapter 7 Application Protocol Reference Architecture

Application Protocol Reference Architecture Chapter 7 Application Protocol Reference Architecture This chapter proposes an alternative reference architecture for application protocols. The proposed reference

### Mobile Storage and Search Engine of Information Oriented to Food Cloud

Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:

### Big Data Rethink Algos and Architecture. Scott Marsh Manager R&D Personal Lines Auto Pricing

Big Data Rethink Algos and Architecture Scott Marsh Manager R&D Personal Lines Auto Pricing Agenda History Map Reduce Algorithms History Google talks about their solutions to their problems Map Reduce:

Repository for Business Processes and Arbitrary Associated Metadata Jussi Vanhatalo 12, Jana Koehler 1, and Frank Leymann 2 1 IBM Research GmbH, Zurich Research Laboratory, Säumerstrasse 4, 8803 Rüschlikon,

### Classification On The Clouds Using MapReduce

Classification On The Clouds Using MapReduce Simão Martins Instituto Superior Técnico Lisbon, Portugal simao.martins@tecnico.ulisboa.pt Cláudia Antunes Instituto Superior Técnico Lisbon, Portugal claudia.antunes@tecnico.ulisboa.pt

### MPJ Express Meets YARN: Towards Java HPC on Hadoop Systems

Procedia Computer Science Volume 51, 2015, Pages 2678 2682 ICCS 2015 International Conference On Computational Science MPJ Express Meets YARN: Towards Java HPC on Hadoop Systems Hamza Zafar 1, Farrukh

### Keywords: Big Data, HDFS, Map Reduce, Hadoop

Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Configuration Tuning

### A System for Interactive Authorization for Business Processes for Web Services

A System for Interactive Authorization for Business Processes for Web Services Hristo Koshutanski and Fabio Massacci Dip. di Informatica e Telecomunicazioni - Univ. di Trento via Sommarive 14-38050 Povo

### BSC vision on Big Data and extreme scale computing

BSC vision on Big Data and extreme scale computing Jesus Labarta, Eduard Ayguade,, Fabrizio Gagliardi, Rosa M. Badia, Toni Cortes, Jordi Torres, Adrian Cristal, Osman Unsal, David Carrera, Yolanda Becerra,

### Analyses on functional capabilities of BizTalk Server, Oracle BPEL Process Manger and WebSphere Process Server for applications in Grid middleware

Analyses on functional capabilities of BizTalk Server, Oracle BPEL Process Manger and WebSphere Process Server for applications in Grid middleware R. Goranova University of Sofia St. Kliment Ohridski,

### CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

### A Simplified Approach to Web Service Development

A Simplified Approach to Web Service Development Peter M. Kelly, Paul D. Coddington, and Andrew L. Wendelborn School of Computer Science University of Adelaide South Australia 5005, Australia {pmk,paulc,andrew}@cs.adelaide.edu.au

### MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context

MapReduce Jeffrey Dean and Sanjay Ghemawat Background context BIG DATA!! o Large-scale services generate huge volumes of data: logs, crawls, user databases, web site content, etc. o Very useful to be able

### Modular Communication Infrastructure Design with Quality of Service

Modular Communication Infrastructure Design with Quality of Service Pawel Wojciechowski and Péter Urbán Distributed Systems Laboratory School of Computer and Communication Sciences Swiss Federal Institute

### Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis

, 22-24 October, 2014, San Francisco, USA Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis Teng Zhao, Kai Qian, Dan Lo, Minzhe Guo, Prabir Bhattacharya, Wei Chen, and Ying

### Ontology construction on a cloud computing platform

Ontology construction on a cloud computing platform Exposé for a Bachelor's thesis in Computer science - Knowledge management in bioinformatics Tobias Heintz 1 Motivation 1.1 Introduction PhenomicDB is

### Managing Variability in Software Architectures 1 Felix Bachmann*

Managing Variability in Software Architectures Felix Bachmann* Carnegie Bosch Institute Carnegie Mellon University Pittsburgh, Pa 523, USA fb@sei.cmu.edu Len Bass Software Engineering Institute Carnegie

### Resource Allocation Schemes for Gang Scheduling

Resource Allocation Schemes for Gang Scheduling B. B. Zhou School of Computing and Mathematics Deakin University Geelong, VIC 327, Australia D. Walsh R. P. Brent Department of Computer Science Australian

### A computational model for MapReduce job flow

A computational model for MapReduce job flow Tommaso Di Noia, Marina Mongiello, Eugenio Di Sciascio Dipartimento di Ingegneria Elettrica e Dell informazione Politecnico di Bari Via E. Orabona, 4 70125

### Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications

Enhancing Dataset Processing in Hadoop YARN Performance for Big Data Applications Ahmed Abdulhakim Al-Absi, Dae-Ki Kang and Myong-Jong Kim Abstract In Hadoop MapReduce distributed file system, as the input

### Distributed R for Big Data

Distributed R for Big Data Indrajit Roy HP Vertica Development Team Abstract Distributed R simplifies large-scale analysis. It extends R. R is a single-threaded environment which limits its utility for

### Image Search by MapReduce

Image Search by MapReduce COEN 241 Cloud Computing Term Project Final Report Team #5 Submitted by: Lu Yu Zhe Xu Chengcheng Huang Submitted to: Prof. Ming Hwa Wang 09/01/2015 Preface Currently, there s

### Big Systems, Big Data

Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,

### MODEL DRIVEN DEVELOPMENT OF BUSINESS PROCESS MONITORING AND CONTROL SYSTEMS

MODEL DRIVEN DEVELOPMENT OF BUSINESS PROCESS MONITORING AND CONTROL SYSTEMS Tao Yu Department of Computer Science, University of California at Irvine, USA Email: tyu1@uci.edu Jun-Jang Jeng IBM T.J. Watson

### Distributed Systems and Recent Innovations: Challenges and Benefits

Distributed Systems and Recent Innovations: Challenges and Benefits 1. Introduction Krishna Nadiminti, Marcos Dias de Assunção, and Rajkumar Buyya Grid Computing and Distributed Systems Laboratory Department

### Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

### Cloud Computing based on the Hadoop Platform

Cloud Computing based on the Hadoop Platform Harshita Pandey 1 UG, Department of Information Technology RKGITW, Ghaziabad ABSTRACT In the recent years,cloud computing has come forth as the new IT paradigm.

### Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques

Enforcing Data Quality Rules for a Synchronized VM Log Audit Environment Using Transformation Mapping Techniques Sean Thorpe 1, Indrajit Ray 2, and Tyrone Grandison 3 1 Faculty of Engineering and Computing,

### Kepler + Hadoop : A General Architecture Facilitating Data-Intensive Applications in Scientific Workflow Systems

Kepler + Hadoop : A General Architecture Facilitating Data-Intensive Applications in Scientific Workflow Systems Jianwu Wang, Daniel Crawl, Ilkay Altintas San Diego Supercomputer Center, University of

### Distributed Framework for Data Mining As a Service on Private Cloud

RESEARCH ARTICLE OPEN ACCESS Distributed Framework for Data Mining As a Service on Private Cloud Shraddha Masih *, Sanjay Tanwani** *Research Scholar & Associate Professor, School of Computer Science &

### Computer Science at Kent

Computer Science at Kent Transformation in HaRe Chau Nguyen-Viet Technical Report No. 21-04 December 2004 Copyright 2004 University of Kent Published by the Computing Laboratory, University of Kent, Canterbury,

### UIMA and WebContent: Complementary Frameworks for Building Semantic Web Applications

UIMA and WebContent: Complementary Frameworks for Building Semantic Web Applications Gaël de Chalendar CEA LIST F-92265 Fontenay aux Roses Gael.de-Chalendar@cea.fr 1 Introduction The main data sources

### http://www.wordle.net/

Hadoop & MapReduce http://www.wordle.net/ http://www.wordle.net/ Hadoop is an open-source software framework (or platform) for Reliable + Scalable + Distributed Storage/Computational unit Failures completely

### The Architectural Design of FRUIT: A Family of Retargetable User Interface Tools

The Architectural Design of : A Family of Retargetable User Interface Tools Yi Liu Computer Science University of Mississippi University, MS 38677 H. Conrad Cunningham Computer Science University of Mississippi

### Software Life-Cycle Management

Ingo Arnold Department Computer Science University of Basel Theory Software Life-Cycle Management Architecture Styles Overview An Architecture Style expresses a fundamental structural organization schema

### Data processing goes big

Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

### Towards Management of SLA-Aware Business Processes Based on Key Performance Indicators

Towards Management of SLA-Aware Business Processes Based on Key Performance Indicators Branimir Wetzstein, Dimka Karastoyanova, Frank Leymann Institute of Architecture of Application Systems, University

### The countdown problem

JFP 12 (6): 609 616, November 2002. c 2002 Cambridge University Press DOI: 10.1017/S0956796801004300 Printed in the United Kingdom 609 F U N C T I O N A L P E A R L The countdown problem GRAHAM HUTTON

### [Refer Slide Time: 05:10]

Principles of Programming Languages Prof: S. Arun Kumar Department of Computer Science and Engineering Indian Institute of Technology Delhi Lecture no 7 Lecture Title: Syntactic Classes Welcome to lecture

The Hadoop Framework Nils Braden University of Applied Sciences Gießen-Friedberg Wiesenstraße 14 35390 Gießen nils.braden@mni.fh-giessen.de Abstract. The Hadoop Framework offers an approach to large-scale

### Log Mining Based on Hadoop s Map and Reduce Technique

Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, anujapandit25@gmail.com Amruta Deshpande Department of Computer Science, amrutadeshpande1991@gmail.com

### Map Reduce / Hadoop / HDFS

Chapter 3: Map Reduce / Hadoop / HDFS 97 Overview Outline Distributed File Systems (re-visited) Motivation Programming Model Example Applications Big Data in Apache Hadoop HDFS in Hadoop YARN 98 Overview

### Big Data and Apache Hadoop s MapReduce

Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23

### Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team

Software tools for Complex Networks Analysis Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team MOTIVATION Why do we need tools? Source : nature.com Visualization Properties extraction

### Service-Oriented Architectures

Architectures Computing & 2009-11-06 Architectures Computing & SERVICE-ORIENTED COMPUTING (SOC) A new computing paradigm revolving around the concept of software as a service Assumes that entire systems

### Semantic Workflow Management for E-Social Science

Semantic Workflow Management for E-Social Science E. Pignotti 1, P. Edwards 1, A. Preece 1, G. Polhill 2, N. Gotts 2 1 Dept. of Computing Science, University of Aberdeen. 2 The Macaulay Institute, Craigiebuckler,

### A Multi-layered Domain-specific Language for Stencil Computations

A Multi-layered Domain-specific Language for Stencil Computations Christian Schmitt, Frank Hannig, Jürgen Teich Hardware/Software Co-Design, University of Erlangen-Nuremberg Workshop ExaStencils 2014,

### Soaplab - a unified Sesame door to analysis tools

Soaplab - a unified Sesame door to analysis tools Martin Senger, Peter Rice, Tom Oinn European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, UK http://industry.ebi.ac.uk/soaplab Abstract

### ASAP D7.1 Integration Prototype ASAP System Prototype v.1

FP7 Project ASAP Adaptable Scalable Analytics Platform Integration Prototype ASAP System Prototype v.1 WP 7 Integration of the ASAP System Nature: Report Dissemination: Public Version History Version Date

### CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model

### Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging

### Oracle Service Bus Examples and Tutorials

March 2011 Contents 1 Oracle Service Bus Examples... 2 2 Introduction to the Oracle Service Bus Tutorials... 5 3 Getting Started with the Oracle Service Bus Tutorials... 12 4 Tutorial 1. Routing a Loan

### Lehrstuhl für Informatik 4 Kommunikation und verteilte Systeme. Middleware. Chapter 8: Middleware

Middleware 1 Middleware Lehrstuhl für Informatik 4 Middleware: Realisation of distributed accesses by suitable software infrastructure Hiding the complexity of the distributed system from the programmer

### LDIF - Linked Data Integration Framework

LDIF - Linked Data Integration Framework Andreas Schultz 1, Andrea Matteini 2, Robert Isele 1, Christian Bizer 1, and Christian Becker 2 1. Web-based Systems Group, Freie Universität Berlin, Germany a.schultz@fu-berlin.de,