MBrace: Cloud Computing with Monads

Similar documents
An Introduction and Developer s Guide to Cloud Computing with MBrace

CIEL A universal execution engine for distributed data-flow computing

The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi,

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Big Data on Microsoft Platform

The Inside Scoop on Hadoop

Apache Flink Next-gen data analysis. Kostas

Performance Evaluation for BlobSeer and Hadoop using Machine Learning Algorithms

Unified Big Data Processing with Apache Spark. Matei

CSE-E5430 Scalable Cloud Computing Lecture 2

A Cost-Evaluation of MapReduce Applications in the Cloud

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

BIG DATA TRENDS AND TECHNOLOGIES

Moving From Hadoop to Spark

The Stratosphere Big Data Analytics Platform

Hadoop Parallel Data Processing

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Hadoop on Windows Azure: Hive vs. JavaScript for Processing Big Data

Map-Reduce for Machine Learning on Multicore

CPET 581 Cloud Computing: Technologies and Enterprise IT Strategies

Hadoop IST 734 SS CHUNG

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

MapReduce, Hadoop and Amazon AWS

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges

Apache Hama Design Document v0.6

Fast Analytics on Big Data with H20

Introduction to Cloud Computing

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Systems Engineering II. Pramod Bhatotia TU Dresden dresden.de

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Big Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island

Azure Data Lake Analytics

Stateful Distributed Dataflow Graphs: Imperative Big Data Programming for the Masses

BIG DATA What it is and how to use?

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Convex Optimization for Big Data: Lecture 2: Frameworks for Big Data Analytics

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Investigating Hadoop for Large Spatiotemporal Processing Tasks

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Apache Hadoop. Alexandru Costan

A Brief Introduction to Apache Tez

Big Data Too Big To Ignore

Disco: Beyond MapReduce

Daniel J. Adabi. Workshop presentation by Lukas Probst

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

BIG DATA - HADOOP PROFESSIONAL amron

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

Introduction to Hadoop

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lecture 10 - Functional programming: Hadoop and MapReduce

A programming model in Cloud: MapReduce

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea

GraySort on Apache Spark by Databricks

Spark and the Big Data Library

HiBench Installation. Sunil Raiyani, Jayam Modi

Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud

Map Reduce / Hadoop / HDFS

MapReduce (in the cloud)

Schema Design Patterns for a Peta-Scale World. Aaron Kimball Chief Architect, WibiData

Challenges for Data Driven Systems

CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University

BlobSeer: Towards efficient data storage management on large-scale, distributed systems

HDFS Cluster Installation Automation for TupleWare

HPC ABDS: The Case for an Integrating Apache Big Data Stack

Real Time Big Data Processing

Spark: Cluster Computing with Working Sets

A Survey on Cloud Storage Systems

GraySort and MinuteSort at Yahoo on Hadoop 0.23

Distributed DataFrame on Spark: Simplifying Big Data For The Rest Of Us

Microsoft SQL Server Connector for Apache Hadoop Version 1.0. User Guide

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Big Data and Industrial Internet

Hadoop Operations Management for Big Data Clusters in Telecommunication Industry

HYBRID CLOUD SUPPORT FOR LARGE SCALE ANALYTICS AND WEB PROCESSING. Navraj Chohan, Anand Gupta, Chris Bunch, Kowshik Prakasam, and Chandra Krintz

Leveraging the Power of SOLR with SPARK. Johannes Weigend QAware GmbH Germany pache Big Data Europe September 2015

BIG DATA USING HADOOP

Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park

Hadoop & Spark Using Amazon EMR

BIG DATA SOLUTION DATA SHEET

Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Optimize the execution of local physics analysis workflows using Hadoop

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Brave New World: Hadoop vs. Spark

Managing Hybrid deployments using Cloud Foundry on Azure

CLOUD COMPUTING USING HADOOP TECHNOLOGY

ITG Software Engineering

Transcription:

MBrace: Cloud Computing with Monads Jan Dzik Nick Palladinos Kostas Rontogiannis Eirik Tsarpalis Nikolaos Vathis Nessos Information Technologies, SA 7th Workshop on Programming Languages and Operating Systems November 3, 2013 Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 1 / 29

Introduction Motivation Motivation Distributed Computation is Challenging. Key to success: choose the right distribution framework. Each framework tied to particular programming abstraction. Map-Reduce, Actor model, Dataflow model, etc. Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 2 / 29

Introduction Motivation Established distributed frameworks Restrict to specific distribution patterns. Not expressive enough for certain classes of algorithms. Difficult to influence task granularity. Time consuming to deploy, manage and debug. Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 3 / 29

Introduction What is MBrace? What is MBrace? 1 A new programming model for the cloud. 2 An elastic, fault tolerant, multitasking cluster infrastructure. Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 4 / 29

Introduction In This Talk In This Talk Concentrate on the programming model. Distributed Computation. Distributed Data. Benchmarks. Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 5 / 29

The MBrace Programming Model The Cloud Monad The MBrace Programming Model A monad for composing distribution workflows. Essentially a continuation monad that admits distribution. Based on F# computation expressions. Inspired by the successful F# asynchronous workflows. Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 6 / 29

The MBrace Programming Model The Cloud Monad A Basic cloud workflow let download (url : string) = cloud { let client = new System.Net.WebClient() let content = client.downloadstring(url) return content } : Cloud<string> Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 7 / 29

The MBrace Programming Model The Cloud Monad Composing cloud workflows let downloadsequential () = cloud { let! c1 = download "http://m-brace.net/" let! c2 = download "http://nessos.gr/" let c = c1 + c2 } return c Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 8 / 29

The MBrace Programming Model Distribution Combinators Parallel Composition let downloadparallel () = cloud { let! c1,c2 = download "http://m-brace.net/" < > download "http://nessos.gr/" } return c1 + c2 Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 9 / 29

The MBrace Programming Model Distribution Combinators Distribution Primitives: an overview Binary parallel operator: < > : Cloud<'T> -> Cloud<'U> -> Cloud<'T * 'U> Variadic parallel combinator: Cloud.Parallel : Cloud<'T> [] -> Cloud<'T []> Non-deterministic parallel combinator: Cloud.Choice : Cloud<'T option> [] -> Cloud<'T option> Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 10 / 29

The MBrace Programming Model Additional Constructs Cloud Monad: additional constructs Monadic for loops. Monadic while loops. Monadic exception handling. Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 11 / 29

The MBrace Programming Model Additional Constructs Example: Inverse squares let inversesquares (inputs : int []) = cloud { let jobs : Cloud<float> [] = [ for i in inputs -> cloud { return 1.0 / float (i * i) } ] try let! results = Cloud.Parallel jobs return Array.sum results } with :? DivideByZeroException -> return -1.0 Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 12 / 29

The MBrace Programming Model Evaluation in the Cloud How is it all executed? Scheduler/worker cluster organization. Symbolic execution stack (free monad/trampolines). Scheduler interprets monadic skeleton. Native leaf expressions dispatched to workers. Symbolic stack winds across multiple machines. Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 13 / 29

The MBrace Programming Model Map-Reduce A Map-Reduce implementation let rec mapreduce (map : 'T -> Cloud<'R>) (reduce : 'R -> 'R -> Cloud<'R>) (identity : 'R) (input : 'T list) = cloud { match input with [] -> return identity [value] -> return! map value _ -> let left, right = List.split input let! r1, r2 = (mapreduce map reduce identity left) < > (mapreduce map reduce identity right) } return! reduce r1 r2 Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 14 / 29

The Distributed Data Programming Model Introduction What about Data Distribution? MBrace does NOT include a storage service (for now). Relies on third-party storage services. Storage Provider plugin architecture. Out-of-the-box support for FileSystem, SQL and Azure. Future support for HDFS and Amazon S3. Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 15 / 29

The Distributed Data Programming Model The MBrace Data Programming Model The MBrace Data Programming Model Storage services interfaced through data primitives. Data primitives act as references to distributed resources. Initialized or updated through the monad. Come in immutable or mutable flavors. Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 16 / 29

The Distributed Data Programming Model Cloud Ref Cloud Ref Simplest distributed data primitive of MBrace. Generic reference to a stored value. Conceptually similar to ML ref cells. Immutable by design. Cached in worker nodes for performance. Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 17 / 29

The Distributed Data Programming Model Cloud Ref Cloud Ref: Example let createref (inputs : int []) = cloud { let! ref = CloudRef.New inputs } return ref : CloudRef<int []> let deref (ref : CloudRef<int []>) = cloud { let content = ref.value } return content : int [] Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 18 / 29

The Distributed Data Programming Model Cloud Ref Application: Data Sharding type DistribTree<'T> = Leaf of 'T Branch of CloudRef<DistribTree<'T>> * CloudRef<DistribTree<'T>> let rec map (f : 'T -> 'S) (tree : DistribTree<'T>) = cloud { match tree with Leaf t -> return! CloudRef.New (Leaf (f t)) Branch(l,r) -> let! l', r' = map f l.value < > map f r.value return! CloudRef.New (Branch(l',r')) } Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 19 / 29

The Distributed Data Programming Model Cloud File Cloud File References files in the distributed store. Untyped, immutable, binary blobs. Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 20 / 29

The Distributed Data Programming Model Cloud File Cloud File : Example let getsize (file : CloudFile) = cloud { let! bytes = CloudFile.ReadAllBytes file return bytes.length / 1024 } cloud { let! files = CloudDir.GetFiles "/path/to/files" let jobs = Array.map getsize files let! sizes = Cloud.Parallel jobs return Array.sum sizes } Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 21 / 29

The MBrace Framework Performance Performance We tested MBrace against Hadoop. Both frameworks were run on Windows Azure. Clusters consisted of 4, 8, 16 and 32 quad-core nodes. Two algorithms were tested, grep and k-means. Source code available on github. Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 22 / 29

The MBrace Framework Performance Distributed Grep (Windows Azure) Count occurrences of given pattern from input files. Straightforward Map-Reduce algorithm. Input data was 32, 64, 128 and 256 GB of text. Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 23 / 29

The MBrace Framework Performance Distributed Grep (Windows Azure) 400 300 Time (sec) 200 100 0 MBrace Hadoop 20 40 60 80 100 120 worker cores Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 24 / 29

The MBrace Framework Performance k-means Clustering (Windows Azure) Centroid computation out of a set of vectors. Iterative algorithm. Not naturally definable with Map-Reduce workflows. Hadoop implementation from Apache Mahout library. Input was 10 6, randomly generated, 100-dimensional points. Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 25 / 29

The MBrace Framework Performance k-means Clustering (Windows Azure) 1,500 MBrace Hadoop Time (sec) 1,000 500 0 20 40 60 80 100 120 worker cores Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 26 / 29

Conclusions & Future Work Conclusions A big data platform for the.net framework. Language-integrated cloud workflows. User-specifiable parallelism patterns and task granularity. Distributed exception handling. Pluggable storage services. Data API integrated with programming model. Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 27 / 29

Conclusions & Future Work Future Work Improved C# support. A rich library of combinators and parallelism patterns. A LINQ provider for data parallelism. Support for the Mono framework and Linux. Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 28 / 29

Conclusions & Future Work Thank You! Questions? http://m-brace.net Eirik Tsarpalis (Nessos IT) MBrace: Cloud Computing with Monads PLOS 13 29 / 29