Akka Streams. Dr. Roland Kuhn @rolandkuhn Typesafe

Similar documents

Akka Stream and HTTP Experimental Java Documentation

Akka Stream and HTTP Experimental Java Documentation

Scaling Up & Out with Actors: Introducing Akka

Reactive Slick for Database Programming. Stefan Zeiger

Above the Clouds: Introducing Akka

Spark Job Server. Evan Chan and Kelvin Chu. Date

SwiftScale: Technical Approach Document

Monitoring Hadoop with Akka. Clint Combs Senior Software Engineer at Collective

emontage: An Architecture for Rapid Integration of Situational Awareness Data at the Edge

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

Real Time Data Processing using Spark Streaming

In-Memory BigData. Summer 2012, Technology Overview

Reactive Applications: What if Your Internet of Things has 1000s of Things?

How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Apache Storm vs. Spark Streaming Two Stream Processing Platforms compared

How To Write A Data Processing Pipeline In R

Trend Micro Big Data Platform and Apache Bigtop. 葉祐欣 (Evans Ye) Big Data Conference 2015

Big Data Pipeline and Analytics Platform

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

tuplejump The data engineering platform

Concurrent Programming for you and me

Quantifind s story: Building custom interactive data analytics infrastructure

Adobe LiveCycle Data Services 3 Performance Brief

Predictive Analytics with Storm, Hadoop, R on AWS

Using Kafka to Optimize Data Movement and System Integration. Alex

JoramMQ, a distributed MQTT broker for the Internet of Things

Big Graph Analytics on Neo4j with Apache Spark. Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage

Streaming items through a cluster with Spark Streaming

Graylog2 Lennart Koopmann, OSDC /

Wisdom from Crowds of Machines

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet

StreamStorage: High-throughput and Scalable Storage Technology for Streaming Data

Real-time Big Data Analytics with Storm

Unified Big Data Analytics Pipeline. 连城

An Open-Source Streaming Machine Learning and Real-Time Analytics Architecture

Tungsten Replicator, more open than ever!

TIBCO StreamBase High Availability Deploy Mission-Critical TIBCO StreamBase Applications in a Fault Tolerant Configuration

The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi,

Capacity Planning Guide for Adobe LiveCycle Data Services 2.6

TFE listener architecture. Matt Klein, Staff Software Engineer Twitter Front End

SQL Server 2005 Features Comparison

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Today s Agenda. Automata and Logic. Quiz 4 Temporal Logic. Introduction Buchi Automata Linear Time Logic Summary

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Responsive, resilient, elastic and message driven system

Future Internet Technologies

WSO2 Message Broker. Scalable persistent Messaging System

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Designing a Cloud Storage System

Lecture 14: Data transfer in multihop wireless networks. Mythili Vutukuru CS 653 Spring 2014 March 6, Thursday

Real-Time KVM for the Masses Unrestricted Siemens AG All rights reserved

Kafka & Redis for Big Data Solutions

Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1

INTRODUCTION. The Challenges

Table Of Contents. 1. GridGain In-Memory Database

In-Memory Computing for Iterative CPU-intensive Calculations in Financial Industry In-Memory Computing Summit 2015

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context

QL Integration into Scala and Excel. Martin Dietrich

Spark: Making Big Data Interactive & Real-Time

The Value of Taxonomy Management Research Results

Hadoop2, Spark Big Data, real time, machine learning & use cases. Cédric Carbone Twitter

Apache Flink. Fast and Reliable Large-Scale Data Processing

Architectures for massive data management

Partitioning under the hood in MySQL 5.5

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

COMMMUNICATING COOPERATING PROCESSES

Bandwidth Aggregation, Teaming and Bonding

Integrating Web Messaging into the Enterprise Middleware Layer

Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com

Apache Kafka Your Event Stream Processing Solution

CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University

Big Data Analytics! Architectures, Algorithms and Applications! Part #3: Analytics Platform

Web Traffic Capture Butler Street, Suite 200 Pittsburgh, PA (412)

R / TERR. Ana Costa e SIlva, PhD Senior Data Scientist TIBCO. Copyright TIBCO Software Inc.

Next Generation Open Source Messaging with Apache Apollo

Oracle Spatial and Graph. Jayant Sharma Director, Product Management

INTRODUCING APACHE IGNITE An Apache Incubator Project

SAP HANA SPS 09 - What s New? HANA IM Services: SDI and SDQ

Apache Spark and Distributed Programming

Event-driven plugins with Grails 3. Göran Ehrsson, Technipelago AB

Design, Implementation, and Evaluation of Network Monitoring Tasks with the TelegraphCQ Data Stream Management System

Ikasan ESB Reference Architecture Review

Graph Database Proof of Concept Report

TIBCO Live Datamart: Push-Based Real-Time Analytics

Online and Scalable Data Validation in Advanced Metering Infrastructures

Big Data Analytics - Accelerated. stream-horizon.com

Simba Apache Cassandra ODBC Driver

Pluribus Netvisor Solution Brief

Middleware Platforms for Application Development: A Product Comparison

Embedded Systems. 6. Real-Time Operating Systems

Transcription:

Akka Streams Dr. Roland Kuhn @rolandkuhn Typesafe

Why Streams? processing big data with finite memory real-time data processing (CEP) serving numerous clients simultaneously with bounded resources (IoT, streaming HTTP APIs) 2

What is a Stream? ephemeral, time-dependent sequence of elements possibly unbounded in length therefore focusing on transformations «You cannot step twice into the same stream. For as you are stepping in, other waters are ever flowing on to you.» Heraclitus 3

Declaring a Stream Topology 4

Declaring a Stream Topology 5

Declaring a Stream Topology 6

Declaring a Stream Topology 7

Declaring a Stream Topology 8

Declaring a Stream Topology 9

Declaring a Stream Topology 10

Declaring and Running a Stream val upper = Source(Iterator from 0).take(10) val lower = Source(1.second, 1.second, () => Tick) val source = Source[(Int, Tick)]() { implicit b => val zip = Zip[Int, Tick] val out = UndefinedSink[(Int, Tick)] upper ~> zip.left ~> out lower ~> zip.right out } val flow = Flow[(Int, Tick)].map{ case (x, _) => s"tick $x" } val sink = Sink.foreach(println) val future = source.connect(flow).runwith(sink) 11

Declaring and Running a Stream val upper = Source(Iterator from 0).take(10) val lower = Source(1.second, 1.second, () => Tick) val source = Source[(Int, Tick)]() { implicit b => val zip = Zip[Int, Tick] val out = UndefinedSink[(Int, Tick)] upper ~> zip.left ~> out lower ~> zip.right out } val flow = Flow[(Int, Tick)].map{ case (x, _) => s"tick $x" } val sink = Sink.foreach(println) val future = source.connect(flow).runwith(sink) 12

Declaring and Running a Stream val upper = Source(Iterator from 0).take(10) val lower = Source(1.second, 1.second, () => Tick) val source = Source[(Int, Tick)]() { implicit b => val zip = Zip[Int, Tick] val out = UndefinedSink[(Int, Tick)] upper ~> zip.left ~> out lower ~> zip.right out } val flow = Flow[(Int, Tick)].map{ case (x, _) => s"tick $x" } val sink = Sink.foreach(println) val future = source.connect(flow).runwith(sink) 13

Materialization Akka Streams separate the what from the how declarative Source/Flow/Sink DSL to create blueprint FlowMaterializer turns this into running Actors this allows alternative materialization strategies optimization verification / validation cluster deployment only Akka Actors for now, but more to come! 14

Stream Sources org.reactivestreams.publisher[t] org.reactivestreams.subscriber[t] Iterator[T] / Iterable[T] Code block (function that produces Option[T]) scala.concurrent.future[t] TickSource ActorPublisher singleton / empty / failed plus write your own (fully extensible) 15

Stream Sinks org.reactivestreams.publisher[t] org.reactivestreams.subscriber[t] ActorSubscriber scala.concurrent.future[t] blackhole / foreach / fold / oncomplete or create your own 16

Linear Stream Transformations Deterministic (like for collections) map, filter, collect, grouped, drop, take, groupby, Time-Based takewithin, dropwithin, groupedwithin, Rate-Detached expand, conflate, buffer, asynchronous mapasync, mapasyncunordered, flatten, 17

Nonlinear Stream Transformations Fan-In merge, concat, zip, Fan-Out broadcast, route, balance, unzip, 18

Why does this work? val upper = Source(Iterator from 0) // infinitely fast val lower = Source(1.second, 1.second, () => Tick) val source = Source[(Int, Tick)]() { implicit b => val zip = Zip[Int, Tick] val out = UndefinedSink[(Int, Tick)] upper ~> zip.left ~> out lower ~> zip.right out } val flow = Flow[(Int, Tick)].map{ case (x, _) => s"tick $x" } val sink = Sink.foreach(println) val future = source.connect(flow).runwith(sink) 19

Reactive Traits

Back-Pressure: the Reactive Streams Initiative

Participants Engineers from Netflix Oracle Pivotal Red Hat Twitter Typesafe Individuals like Doug Lea and Todd Montgomery 22

The Motivation all participants had the same basic problem all are building tools for their community a common solution benefits everybody interoperability to make best use of efforts 23

Recipe for Success minimal interfaces rigorous specification of semantics full TCK for verification of implementation complete freedom for many idiomatic APIs 24

The Meat trait Publisher[T] { def subscribe(sub: Subscriber[T]): Unit } trait Subscription { def request(n: Long): Unit def cancel(): Unit } trait Subscriber[T] { def onsubscribe(s: Subscription): Unit def onnext(elem: T): Unit def onerror(thr: Throwable): Unit def oncomplete(): Unit } 25

Supply and Demand data items flow downstream demand flows upstream data items flow only when there is demand recipient is in control of incoming data rate data in flight is bounded by signaled demand demand Publisher Subscriber data 26

Dynamic Push Pull push behavior when consumer is faster pull behavior when producer is faster switches automatically between these batching demand allows batching data demand Publisher Subscriber data 27

Explicit Demand: Tailored Flow Control demand data splitting the data means merging the demand 28

Explicit Demand: Tailored Flow Control merging the data means splitting the demand 29

Reactive Streams asynchronous non-blocking data flow asynchronous non-blocking demand flow minimal coordination and contention message passing allows for distribution across applications across nodes across CPUs across threads across actors 30

Interoperability is King

A fully working example ActorSystem system = ActorSystem.create("InteropTest"); FlowMaterializer mat = FlowMaterializer.create(system); RxRatpack.initialize(); EmbeddedApp.fromHandler(ctx -> { Integer[] ints = new Integer[10]; for (int i = 0; i < ints.length; ++i) { ints[i] = i; } // RxJava Observable Observable<Integer> intobs = Observable.from(ints); // Reactive Streams Publisher Publisher<Integer> intpub = RxReactiveStreams.toPublisher(intObs); // Akka Streams Source Source<String> stringsource = Source.from(intPub).map(Object::toString); // Reactive Streams Publisher Publisher<String> stringpub = stringsource.runwith(sink.<string>fanoutpublisher(1, 1), mat); // Reactor Stream Stream<String> linesstream = Streams.create(stringPub).map(i -> i + "\n"); // and now render the HTTP response using Ratpack ctx.render(responsechunks.stringchunks(linesstream)); }); https://github.com/rkuhn/reactivestreamsinterop 32

When can we have it? Sample used pre-release versions: reactive-streams 0.4.0 RxJava 1.0.0-rc.8 with rxjava-reactive-streams 0.3.0 reactor-core 2.0.0.M1 ratpack-core 0.9.10 akka-stream-experimental 0.10-M1 stable versions expected within the next months Reactive Streams 1.0 some weeks away 33

Outlook Akka HTTP (successor of Spray.io) fully stream-based Java and Scala DSLs client and server more stream-based APIs file I/O (on JRE 7 and higher) database drivers (community developed) Akka Persistence with streams of events 34

Advertisement: Berlin Scala User Group Hack Sequel Nov 14 16, 2014 There will be T-Shirts, catering and a prize!

Typesafe 2014 All Rights Reserved