Cloud Haskell. Distributed Programming with. Andres Löh. 14 June 2013, Big Tech Day 6 Copyright 2013 Well-Typed LLP. .Well-Typed



Similar documents
Towards Haskell in the Cloud

Functional programming for the data centre

Thesis Proposal: Improving the Performance of Synchronization in Concurrent Haskell

HiDb: A Haskell In-Memory Relational Database

Chapter 2: Remote Procedure Call (RPC)

Datatype-generic data migrations

Introduction CORBA Distributed COM. Sections 9.1 & 9.2. Corba & DCOM. John P. Daigle. Department of Computer Science Georgia State University

The ConTract Model. Helmut Wächter, Andreas Reuter. November 9, 1999

Resource Utilization of Middleware Components in Embedded Systems

Timber: A Programming Language for Real-Time Embedded Systems

Practical Generic Programming with OCaml

C#5.0 IN A NUTSHELL. Joseph O'REILLY. Albahari and Ben Albahari. Fifth Edition. Tokyo. Sebastopol. Beijing. Cambridge. Koln.

Data Management in the Cloud

Scala Actors Library. Robert Hilbrich

Getting Started with the Internet Communications Engine

Aneka Dynamic Provisioning

Mimer SQL Real-Time Edition White Paper

Functional Programming in C++11

File S1: Supplementary Information of CloudDOE

The BSN Hardware and Software Platform: Enabling Easy Development of Body Sensor Network Applications

High Availability Solutions for the MariaDB and MySQL Database

Real Time Programming: Concepts

CONTROL MICROSYSTEMS DNP3. User and Reference Manual

Driving force. What future software needs. Potential research topics

Network Licensing. White Paper 0-15Apr014ks(WP02_Network) Network Licensing with the CRYPTO-BOX. White Paper

7CS-A(CD_lab) Actual date of covering the. Reason for not covering the topic in due time. Month in which the topic will be covered

Lesson 4 Web Service Interface Definition (Part I)

Application Note: AN00121 Using XMOS TCP/IP Library for UDP-based Networking

3.5. cmsg Developer s Guide. Data Acquisition Group JEFFERSON LAB. Version

To Java SE 8, and Beyond (Plan B)

C# Cookbook. Stephen Teilhet andjay Hilyard. O'REILLY 8 Beijing Cambridge Farnham Köln Paris Sebastopol Taipei Tokyo '"J""'

PROFESSIONAL. Node.js BUILDING JAVASCRIPT-BASED SCALABLE SOFTWARE. Pedro Teixeira WILEY. John Wiley & Sons, Inc.

Programming and Reasoning with Side-Effects in IDRIS

Library ModbusRTUlib Modbus RTU master communication. TXV rd Issue February 2010 All rights reserved

Realizing Enterprise Integration Patterns in WebSphere

Chapter 6, The Operating System Machine Level

Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child

- An Essential Building Block for Stable and Reliable Compute Clusters

The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

Cloud9 Parallel Symbolic Execution for Automated Real-World Software Testing

Computer Organization & Architecture Lecture #19

Top Weblogic Tasks You can Automate Now

The Advantages and Disadvantages of Network Computing Nodes

Programming and Reasoning with Algebraic Effects and Dependent Types

Eloquence Training What s new in Eloquence B.08.00

16.1 MAPREDUCE. For personal use only, not for distribution. 333

Crash Course in Java

POLYTYPIC PROGRAMMING OR: Programming Language Theory is Helpful

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

Advanced Functional Programming (9) Domain Specific Embedded Languages

Fast Analytics on Big Data with H20

GTask Developing asynchronous applications for multi-core efficiency

MBrace: Cloud Computing with Monads

Simplest Scalable Architecture

Comparing SQL and NOSQL databases

Developing a Web Server Platform with SAPI Support for AJAX RPC using JSON

Fun with Phantom Types

pyownet Documentation

Partner Portal DOCUMENT. Ticketing User Guide. NTT Communications

Object Oriented Database Management System for Decision Support System.

Caml Virtual Machine File & data formats Document version: 1.4

Introduction to Parallel Programming and MapReduce

Replication on Virtual Machines

Modular Communication Infrastructure Design with Quality of Service

Windows Azure Storage Essential Cloud Storage Services

Appendix. Web Command Error Codes. Web Command Error Codes

Scaling Analysis Services in the Cloud

C Programming. for Embedded Microcontrollers. Warwick A. Smith. Postbus 11. Elektor International Media BV. 6114ZG Susteren The Netherlands

Monday, April 8, 13. Creating Successful Magento ERP Integrations

Objectives. Distributed Databases and Client/Server Architecture. Distributed Database. Data Fragmentation

XtreemFS Extreme cloud file system?! Udo Seidel

Using the CoreSight ITM for debug and testing in RTX applications

Gluing things together with Haskell. Neil Mitchell

Middleware Lou Somers

OpenACC 2.0 and the PGI Accelerator Compilers

Amazon Web Services. Elastic Compute Cloud (EC2) and more...

SMock A Test Platform for the Evaluation of Monitoring Tools

Detecting Pattern-Match Failures in Haskell. Neil Mitchell and Colin Runciman York University

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

map/reduce connected components

Lightweight DNS for Multipurpose and Multifunctional Devices

Embedded Software development Process and Tools: Lesson-4 Linking and Locating Software

Object Oriented Software Design

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

Paillier Threshold Encryption Toolbox

MATLAB in Business Critical Applications Arvind Hosagrahara Principal Technical Consultant

Agenda. Distributed System Structures. Why Distributed Systems? Motivation

Client/Server Computing Distributed Processing, Client/Server, and Clusters

: provid.ir

Building Scalable Messaging Systems with Qpid. Lessons Learned from PayPal

Transcription:

Distributed Programming with Cloud Haskell Andres Löh 14 June 2013, Big Tech Day 6 Copyright 2013 Well-Typed LLP Well-Typed The Haskell Consultants

Overview Introduction Haskell Cloud Haskell Communication Going distributed Towards Map-Reduce Well-Typed

Introduction

What is Cloud Haskell? Framework (a number of related packages) for Haskell Message-passing distributed concurrency (Erlang, actors) All in libraries; no (specific) compiler support required Well-Typed

Features Global view on a distributed program Single program runs in potentially many places Processes and nodes are first class entities Communication via (typed) messages Functions can be sent Programmable serialization Easy to monitor processes (and recover from failure) (Draft of) formal semantics Well-Typed

Multicore in Haskell Many approaches Different problems have different requirements / cost models Well-Typed

Multicore in Haskell Many approaches Different problems have different requirements / cost models Concurrency threads and locks ( MVar s) aynchronous computations ( Async s) software transactional memory Well-Typed

Multicore in Haskell Many approaches Different problems have different requirements / cost models (Deterministic) Parallelism evaluation strategies dataflow-based task parallelism flat and nested data parallelism Well-Typed

Multicore in Haskell Many approaches Different problems have different requirements / cost models Distributed Concurrency Cloud Haskell Well-Typed

Freedom of choice Haskell is great for embedded domain-specific languages GHC has a very capable run-time system You can pick whatever suits the needs of your task All the approaches can be combined! Well-Typed

Freedom of choice Haskell is great for embedded domain-specific languages GHC has a very capable run-time system You can pick whatever suits the needs of your task All the approaches can be combined! Lesson Rather than picking a language based on the model you want, pick a library based on the problem you have Well-Typed

Cloud Haskell Example server :: Process () server = forever $ do () expect liftio $ putstrln "ping" client :: ProcessId Process () client serverpid = forever $ do send serverpid () liftio $ threaddelay (1 10ˆ6) main :: IO () main = do Right t createtransport "127001" "201306" defaulttcpparameters node newlocalnode t initremotetable runprocess node $ do serverpid getselfpid spawnlocal $ client serverpid server Well-Typed

Haskell

Pure Functions dist :: Floating a a a a dist x y = sqrt (x x + y y) Well-Typed

Pure Functions dist :: Floating a a a a dist x y = sqrt (x x + y y) data Tree a = Leaf a Node (Tree a) (Tree a) size :: Tree a Int size (Leaf n) = 1 size (Node l r) = size l + size r Well-Typed

Pure Functions dist :: Floating a a a a dist x y = sqrt (x x + y y) data Tree a = Leaf a Node (Tree a) (Tree a) size :: Tree a Int size (Leaf n) = 1 size (Node l r) = size l + size r search :: Eq a Tree a a Bool search (Leaf n) x = n = = x search (Node l r) x = search l x search r x Well-Typed

Type signatures dist :: Floating a a a a size :: Tree a Int search :: Eq a Tree a a Bool Well-Typed

Function calls dist :: Floating a a a a dist x y dist 2 3 dist (2 + x) (3 + x) Well-Typed

IO conversation :: IO () conversation = do putstrln "Who are you?" name getline putstrln $ "Hi " + name + " Where are you from?" loc getline putstrln $ if loc = = "Munich" then "Oh, I love Munich!" else "Sorry, where is " + loc + "?" Well-Typed

IO conversation :: IO () conversation = do putstrln "Who are you?" name getline putstrln $ "Hi " + name + " Where are you from?" loc getline putstrln $ if loc = = "Munich" then "Oh, I love Munich!" else "Sorry, where is " + loc + "?" readnlines :: Int IO [String] readnlines n = replicatem n getline Well-Typed

Monads Maybe a -- possibly failing State s a -- state-maintaining Random a -- depending on a PRNG Signal a -- time-changing Par a -- annotated for parallelism IO a -- arbitrary side effects STM a -- logged transactions Process a -- Cloud Haskell processes Well-Typed

Monads Maybe a -- possibly failing State s a -- state-maintaining Random a -- depending on a PRNG Signal a -- time-changing Par a -- annotated for parallelism IO a -- arbitrary side effects STM a -- logged transactions Process a -- Cloud Haskell processes Semicolon is overloaded You can define your own monads You can decide what the semantics of sequencing in your application should be Well-Typed

Concurrency forkio :: IO () IO ThreadId Well-Typed

Concurrency forkio :: IO () IO ThreadId threaddelay :: Int IO () forever :: Monad m m a m b -- here: IO a IO b Well-Typed

Concurrency forkio :: IO () IO ThreadId threaddelay :: Int IO () forever :: Monad m m a m b -- here: IO a IO b printforever :: String IO () printforever msg = forever $ do putstrln msg threaddelay (1 10ˆ6) main :: IO () main = do forkio $ printforever "child 1" forkio $ printforever "child 2" printforever "parent" Well-Typed

Cloud Haskell

Cloud Haskell example revisited server :: Process () server = forever $ do () expect liftio $ putstrln "ping" client :: ProcessId Process () client serverpid = forever $ do send serverpid () liftio $ threaddelay (1 10ˆ6) main :: IO () main = do Right t createtransport "127001" "201306" defaulttcpparameters node newlocalnode t initremotetable runprocess node $ do serverpid getselfpid spawnlocal $ client serverpid server Well-Typed

Layered architecture Over-simplified: User application Higher-level libraries Distributed process core library Backend (simplelocalnet, Azure, EC2, ) Transport (TCP, in-memory, SSH, ZeroMQ, ) System libraries Well-Typed

Nodes, Processes, Communication Well-Typed

Nodes, Processes, Communication Backend responsible for nodes Processes and communication are backend-agnostic Well-Typed

Spawning and running processes spawnlocal :: Process () Process ProcessId spawn :: NodeId Closure (Process ()) Process ProcessId For the main process: runprocess :: LocalNode Process () IO () Well-Typed

Sending and receiving messages Ad-hoc: send :: Serializable a ProcessId a Process () expect :: Serializable a Process a expecttimeout :: Serializable a Int Process (Maybe a) Sending is asynchronous Receiving blocks Well-Typed

Sending and receiving messages Ad-hoc: send :: Serializable a ProcessId a Process () expect :: Serializable a Process a expecttimeout :: Serializable a Int Process (Maybe a) Sending is asynchronous Receiving blocks Typed channels: newchan :: Serializable a Process (SendPort a, ReceivePort a) sendchan :: Serializable a SendPort a a Process () receivechan :: Serializable a ReceivePort a Process a Well-Typed

Serializable Serializable a = (Typeable a, Binary a) Well-Typed

Serializable Serializable a = (Typeable a, Binary a) Typeable a Binary a -- has a run-time type representation -- has a binary representation Well-Typed

Static and dynamic typing Haskell s typing discipline Haskell is a statically typed language, but can be dynamically typed locally, on demand Well-Typed

Static and dynamic typing Haskell s typing discipline Haskell is a statically typed language, but can be dynamically typed locally, on demand typeof :: Typeable a a TypeRep todyn :: Typeable a a Dynamic fromdynamic :: Typeable a Dynamic Maybe a GHC can derive an instance of Typeable for any datatype automatically Well-Typed

Binary representation encode :: Binary a a ByteString decode :: Binary a ByteString a Haskell has no built-in serialization Automatic generation of sane Binary instances for many datatypes possible via datatype-generic or meta-programming Programmer has control instances can deviate from simply serializing the in-memory representation Well-Typed

Communication

How to reply Idea Messages can include process ids and channel send ports Well-Typed

How to reply Idea Messages can include process ids and channel send ports Old server: server :: Process () server = forever $ do () expect liftio $ putstrln "ping" Well-Typed

How to reply Idea Messages can include process ids and channel send ports New server: server :: Process () server = forever $ do clientpid expect liftio $ putstrln $ "ping " + show clientpid send clientpid () Well-Typed

Adapting the client Old client: client :: ProcessId Process () client serverpid = forever $ do send serverpid () liftio $ threaddelay (1 10ˆ6) Well-Typed

Adapting the client Old client: client :: ProcessId Process () client serverpid = forever $ do send serverpid () liftio $ threaddelay (1 10ˆ6) Well-Typed

Adapting the client New client: client :: ProcessId Process () client serverpid = do clientpid getselfpid forever $ do send serverpid clientpid () expect liftio $ putstrln "pong" liftio $ threaddelay (1 10ˆ6) Well-Typed

More about replying We can send ids of other processes Forwarding, redirection, broadcasting Well-Typed

More about replying We can send ids of other processes Forwarding, redirection, broadcasting For typed channels: We can serialize SendPort But we cannot serialize ReceivePort Well-Typed

Conversations Some rules about exchanging messages: only one mailbox per process; we can expect a particular type; we can receivewait for specific messages; typed channels are separate; sane ordering of messages; messages may remain undelivered Well-Typed

Going distributed

Distributed ping-pong No changes to server and client are needed Old main : main :: IO () main = do Right t createtransport "127001" "201306" defaulttcpparameters node newlocalnode t initremotetable runprocess node $ do serverpid getselfpid spawnlocal $ client serverpid server Well-Typed

Distributed ping-pong No changes to server and client are needed New main (using distributed-process-simplelocalnet): main :: IO () main = do args getargs let rtbl = remotetable initremotetable case args of ["master", port] do backend initializebackend "127001" port rtbl startmaster backend master ["slave", port] do backend initializebackend "127001" port rtbl startslave backend Well-Typed

Automatic detection of slaves startslave :: Backend IO () -- does nothing startmaster :: Backend ([NodeId] Process ()) IO () Well-Typed

Automatic detection of slaves startslave :: Backend IO () -- does nothing startmaster :: Backend ([NodeId] Process ()) IO () Master gets node ids of all slaves Well-Typed

Spawning functions remotely master :: [NodeId] Process () master slaves = do serverpid getselfpid form_ slaves $ λnid spawn nid ($(mkclosure client) serverpid) server Well-Typed

Spawning functions remotely master :: [NodeId] Process () master slaves = do serverpid getselfpid form_ slaves $ λnid spawn nid ($(mkclosure client) serverpid) server Spawns a function call on a remote node Well-Typed

Serializing functions Single program assumption Top-level functions are easy (Partially) applied functions are turned into closures Well-Typed

Serializing functions Single program assumption Top-level functions are easy (Partially) applied functions are turned into closures Currently based on a bit of meta-programming In the future perhaps using a (small) compiler extension Well-Typed

Towards Map-Reduce

Distributing actual work master :: [Input] [NodeId] Process () master inputs workers = do masterpid getselfpid workerpids form workers $ λnid spawn nid ($(mkclosure worker) masterpid) form_ (zip inputs (cycle workerpids)) $ λ(input, workerpid) send workerpid input r collectresults (length inputs) liftio $ print r Well-Typed

Distributing actual work master :: [Input] [NodeId] Process () master inputs workers = do masterpid getselfpid workerpids form workers $ λnid spawn nid ($(mkclosure worker) masterpid) form_ (zip inputs (cycle workerpids)) $ λ(input, workerpid) send workerpid input r collectresults (length inputs) liftio $ print r Well-Typed

Workers workerpids form workers $ λnid spawn nid ($(mkclosure worker) masterpid) worker :: ProcessId Process () worker serverpid = forever $ do x expect -- obtain function input send serverpid (expensivefunction x) The expensivefunction is mapped over all inputs Well-Typed

Collecting results r collectresults (length inputs) liftio $ print r collectresults :: Int Process Result collectresults = go emptyresult where go :: Result Int Process Result go!acc 0 = return acc go!acc n = do r expect -- obtain one result go (combineresults acc r) (n 1) In go we reduce the results Well-Typed

Abstraction and variation Abstracting from expensivefunction, emptyresult, combineresults (and inputs ) yields a simple map-reduce function Can easily use other ways to distribute work, for example work-stealing rather than work-pushing Can use a hierarchy of distribution and reduction processes Well-Typed

Conclusions Aspects we hardly talked about: User-defined message types Matching of messages Embrace failure! (Linking and monitoring) Combination with other multicore frameworks Well-Typed

Conclusions Aspects we hardly talked about: User-defined message types Matching of messages Embrace failure! (Linking and monitoring) Combination with other multicore frameworks Remember: Cloud Haskell is a library (easy to change, extend, adapt) Cloud Haskell is ongoing work All of Haskell plus distributed programming Watch for exciting new backends and higher-level libraries Well-Typed

Want to try it? http://haskell-distributedgithubio/ Mini-tutorial blog series by Duncan Coutts and Edsko de Vries: http://wwwwell-typedcom/blog/70 Well-Typed