Future Internet Technologies Big (?) Processing Dr. Dennis Pfisterer Institut für Telematik, Universität zu Lübeck http://www.itm.uni-luebeck.de/people/pfisterer FIT Until Now Architectures -Server SPDY Big Processing HTTP REST CoAP P2P Networking Design Principles SCTP IPv4 LISP IPv6 6LoWPAN Frame ATM Relay GSM HIP DTN Mobile IP MPLS Security - 04 Cryptology #2 1
Traditional Application-Level Architectures /Server Remote Procedure Call (RPC) REST (HTTP, CoAP, SPDY) Replication and Load Balancing Distributed Applications N-Tier Architectures Service-Oriented Architecture (SOA) Peer-to-Peer Distributed Hash Tables (DHT), #3 /Server Interaction Model No server-initiated data transfer Often due to statelessnes of server Results in unnecessary polling to transfer data from server to client Bandwidth, processing, and memory overhead Request Response Server 4 2
3- and 4-Tier-Architectures: 3-Tier DB-Server Server DB-Server Tier 1: Presentation Tier 2: Business Logic Tier 3: 3- and 4-Tier-Architectures: 4-Tier App Server 1 DB-Server Web Server App Server 2 DB-Server Tier 1: Presentation Tier 2: Web Server Tier 3: Application Server Tier 4: 3
Replication and Load Balancing Application and are replicated on many servers Load balancer distributes load across, e.g., LAMP hosts (Linux, Apache, MySQL, PHP) Bottleneck: base May be clustered (i.e., replicated and/or partitioned) to increase performance Problem: Consistency requirements (ACID) of SQL databases limit scalability App. Instance Server #1 Load Balancer App. Instance Server #2 State App. Instance Server #n #7 Distributed Applications Distribute application (including state) amongst different servers In contrast to multiple instances on multiple servers Typically, the application state is partitioned (e.g., Google BigTable) Better scalability for a certain class of applications More scalable if less strict consistency requirements are imposed (NoSQL) E.g., Facebook, Twitter, App. Instance Server #1 App. Instance Server #n Server #1 Server #n State App. Instance #8 4
Locality For Internet-scale applications, state is distributed globally are typically moved to where usage is expected E.g., German videos Germany #9 Distributed Applications Requirement on today s applications is to perform efficiently on an Internet scale Scalability w/o changing the application #10 5
Exemplary Application Ad Delivery Users User Behavior Mining Web GUI Friendship Graph Application Log Analysis Statistics #11 Internet-Scale Applications It s not about servers anymore and data processing services are important Issue: Efficient distribution, processing and storage of data Move away from fixed (n-tier) layering models Dynamic composition of processing services (at runtime continuous deployment) Communication using messages Required: concepts for loosely-coupled distributed applications #12 6
Message Queuing & Publish/Subscribe #13 Message Queuing Concept of message queues processing services communicate asynchronously using message queues Decouples data producers and consumers - - M 2 M 1 Message Queue #14 7
Message Queuing Mode of operation Producers dispatch messages into a queue Consumers take messages from a queue Multiple queues for different purposes can be created E.g., one for dispatching computational tasks, one for logging, Producers, consumers, and queues may run on different systems Producer Host #1 - - M 2 M 1 Consumer Message Queue Host #q Host #n #15 Message Queuing: Properties Distributed realization of the producer/consumer problem I.e., thread synchronization Producer and consumer are only aware of the queue Association can be changed (dynamically) w/o changing the application Producer and consumer don t need to be available at the same time Queue provides time-decoupling Allows asynchronous operation Producer - - M 2 M 1 Message Queue Consumer #16 8
Message Queuing: Properties Queues can have multiple consumers E.g., to distribute messages (e.g., tasks) to a set of consumers Sometimes called work queues Different message dispatching strategies possible (e.g., round robin, fairness, load-based, ) Producer - - T 4 T 3 Message Queue T 2 T 1 Worker #1 Worker #n #17 Delivery Semantics Best-effort Messages are lost if no consumer is listening Guaranteed (acknowledged) delivery On application layer (cf. DTN custody transfer) Persistent Messages are persisted until consumed Transient (not persistent) Messages are lost if the queue crashes Producer - - M 2 M 1 Message Queue Consumer #18 9
Message-Based RPC RPC mapped to request and response messages Correlation of request and response based on IDs Inherits delivery semantics from message queuing Request id = 123 Request Queue Response Queue Server Response id = 123 #19 Publish/Subscribe Pattern (Pub/Sub) Builds on top of individual message queues Publisher publish messages to an exchange Consumers subscribe to an exchange Exchanges route messages to queues based on subscriptions A broker manages a set of exchanges and queues Distribution of published messages to subscribers Broadcast (see below), topic-based, or content-based (next slides) Broker Producer Exchange - - M 2 M 1 - - M 2 M 1 Consumer #1 Consumer #n #20 10
Topic-Based Publish/Subscribe Consumers subscribe to an exchange with a certain topic Messages are annotated with a topic Topics are arbitrary strings Exchange matches subscriptions and message topics and routes them to the according queues M 1 : news.germany M 2 : news.greece Producer - - M 2 M 1 - - - M 2 Consumer #1 Consumer #n #21 Content-Based Publish/Subscribe Consumers subscribe to an exchange with content-based filters Exchange routes messages based on a message s contents Typically structured data (e.g., JSON, XML) Most production systems use topic-based pub/sub M 1 : { company: apple, price: 123$ } Producer company = apple && 110 < price < 120 - - - - Consumer #1 - - - M 1 Consumer #n company = apple && price > 120 #22 11
Internet-Scale Applications and Pub/Sub Centralized brokers are a single point of failure and bottleneck for scalable applications Brokers can be clustered To increase availability and throughput Requires inter-broker message routing and a communication protocol Creates distributed (replicated, partitioned) subscription state and all associated issues Broker Exchange - - M 2 M 1 - - M 2 M 1 Cluster of Brokers #23 Broker Clustering Producers and consumers may attach to any broker of the cluster How are messages routed? Important goal: efficiency (throughput, bandwidth, latency, consistency, processing cost) Cluster of Brokers Producer Broker? Broker Consumer Consumer Broker Broker Consumer Producer Security - 04 Cryptology #24 12
Topic Based Pub/Sub and Clustering Brokers compute a routing tree Based on upstream consumer subscriptions Subscription filters are aggregated along the path Producer news.eu.germany Broker news.eu.* Broker news.eu.germany Broker Upstream filter aggregation news.eu.greece news.eu.germany news.eu.germany Consumer #1 Consumer #2 Consumer #n #25 Implementations & Co. Pub/Sub Implementations RabbitMQ, http://www.rabbitmq.com Apache ActiveMQ, http://activemq.apache.org Amazon Simple Queue Service (Amazon SQS), http://aws.amazon.com/sqs Apache Qpid, http://qpid.apache.org IBM WebSphere MQ, http://www-01.ibm.com/software/integration/wmq Apache Kafka, http://incubator.apache.org/kafka Kestrel, http://robey.github.com/kestrel Protocols Advanced Message Queuing Protocol (AMQP), http://www.amqp.org ZeroMQ Message Transport Protocol (ZMTP), http://www.zeromq.org Streaming Text Oriented Messaging Protocol (STOMP), http://stomp.github.com XMPP publish-subscribe extension, http://xmpp.org/extensions/xep-0060.html Others Java Message Service (JMS), http://java.sun.com/developer/technicalarticles/ecommerce/jms #26 13
Pub/Sub: Conclusion Advantages Scalability / Clustering / Federation Loose coupling Publisher and subscriber unaware of each other Flexibility and extendibility (add new components at runtime, ) Disconnected operation Disadvantages Loose coupling: hard to give guarantees What happens if a message is stuck in a queue forever E.g., because there is no consumer Brokers are a centralized bottleneck What about efficient and Internet-scale distributed brokers? #27 Map / Reduce #28 14
Map / Reduce Goal: distributed processing of big data sets Typically too large to be processed on a single machine Moving all data would be too costly (time/bandwidth) Map / Reduce: Framework for processing embarrassingly parallel problems on a large number of computers Initially developed by Google (2004) Well-known open-source implementation: Apache Hadoop, http://hadoop.apache.org #29 Parallels to Parallel Programming Programming paradigm to Split up a program into several tasks Process these tasks on different machines in parallel Requires that the algorithm is parallelizable Realization in Google s Map/Reduce Program is split on to n machines 1 master Splits up the full task into subtasks Assigns subtasks to workers Receives results from workers n-1 workers Receive jobs and data from master Return results to master #30 15
Example: Approximating π (1) Area of squares: A s = (2r) 2 = 4r 2 Area of circles: A c = πr 2 Α 4 s Α = π c 4Α π = Α s c r Approximation steps 1. Randomly generate k s points in square 2. Count number of points located in the circle k c 4ks 3. Approximate π k c #31 Example: Approximating π (2) Accuracy increases with larger k s Step 2 (check whether point is in circle) is parallelizable Execution Master generates k s random points in square Each of the n workers w i (1 i n) is assigned a list of points Each worker counts the number of points in the received list which are located in the circle (k i,c for worker w i ) and gives the result to the master Master computes 4 k s π n k i= 1 i,c #32 16
Parallelizing tasks (1) Unsolved problem if all algorithms solving problems for classes P or NP are parallelizable There are many such sequential algorithms without a known parallel equivalent Many scientists assume that there is no general parallelizability property #33 Parallelizing tasks (2) Obviously parallelizable tasks Brute force attack on encrypted documents Count word occurrence in several documents Not (obliviously) parallelizable tasks Compute the Fibonacci function: f k+2 = f k+1 + f k In general: tasks where the next step depends on the previous step(s) results #34 17
Map / Reduce Two or three steps Map Combine (optional) Reduce Dispatch Mode of operation Input data is either distributed (e.g., server log files) or split into multiple parts (e.g., single file) Map preprocesses input data to an intermediate format Framework dispatches intermediate results to reducers (see next slide) Reduce combines intermediate results to final results Figure source: http://wikis.gm.fh-koeln.de/wiki_db/uploads/datenbanken/mapreduce/mapreduce.png Security - 04 Cryptology #35 The lecturer is the best. And this is written by research assistants. Yada, yada, yada, The lecturer is the best [ (the, 1), (lecturer, 1), (is, 1), (the, 1), (best, 1) ] [ (the, [1, 1]) ] [ (the, [2]) ] [ (the, 2), (lecturer, 1), (is, 1) (best, 1) ] Partition 1 Split 1 Map Partition 2 2 Partition n Red 1 Input data Split 2 Map Partition 1 Partition 2 Red 2 Red File, Callback, Partition n Split 3 Map Partition 1 Partition 2 Red n Partition n Intermediate results (partitioned) Security - 04 Cryptology #36 18
- Map: K x V (L x W) * - K and L: sets containing keys - V and W: sets containing values - All elements in a set are of the same data type (e.g., String) - Maps a key/value pair to a list of key/value pairs - Intermediate Results - Partitioned into n partitions - E.g., partition = hash(key) mod n - Transforms (L x W)* L x W* - E.g., (the, 1), (the, 1) (the, [1,1]) - Handled by the framework - Reduce: L x W* W* - E.g., (the, [1,1]) 2 - Framework saves result as (the, 2) Partition 1 Split 1 Map Partition 2 Partition n Red 1 Input data Split 2 Map Partition 1 Partition 2 Red 2 Red File, Callback, Partition n Split 3 Map Partition 1 Partition 2 Red n Partition n Security - 04 Cryptology #37 Example Map and Reduce Functions Input document: The lecturer is the best /* key: document name, value: document content */ map(string key, String value) { for(string w : value.split( )) { EmitIntermediate(w, 1); } } /* key: a word values: list of intermediate results */ reduce(string key, Iterator<int> values) { int result = 0; for (int value : values) { result += value; } Emit(result); } Map [(the, 1), (lecturer, 1), (is, 1), (the, 1), (best, 1)] Intermediate [(the,[1,1]), (lecturer, [1]), (is, [1]), (best, [1])] Reduce [(the, [2]), (lecturer, [1]), (is, [1]), (best, [1])] #38 19
Distributed Stream Processing #39 Distributed Stream Processing Pub/Sub Moving messages between data processing services efficiently Map/Reduce Move data processing services to the data Limitation: Batch processing, only for sufficiently parallelizable jobs Distributed Real-Time Stream Processing Mixture of Pub/Sub and Map/Reduce Guaranteed data processing No intermediate message brokers Goal: create graphs of data sources and data processors Continuous Streams Security - 04 Cryptology #40 20
Distributed Stream Processing Continuous real-time processing of data streams Batch processing in map/reduce Eventually emits a final result Continuous calculation of results Runs forever or until stopped Produces new results over time Scalability / Parallelism Applications are comprised of parallelizable data processing services No centralized component (brokers), components directly exchange data Applications can be distributed globally Internetscale Continuous Streams Security - 04 Cryptology #41 Example: Storm Framework [storm] Implementation of a Distributed Stream Processing framework Open-source project from Twitter Used by many companies (e.g., Twitter, Groupon, etc.) to run their business logic Source Processing Service Processing Service Processing Service Set of data sources and data processing services Connections defined by a graph in which streams are edges Source Processing Service Security - 04 Cryptology #42 21
Storm Streams and Tuples Stream: Unbounded sequence of tuples Tuple: Named list of values Tuple Tuple Tuple Tuple Tuple Tuple Security - 04 Cryptology #43 Storm Spouts A spout is a source of a stream E.g., a spout can read data and emit it as tuples from a File Message queue Web service RSS feed Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple Figure source: http://storm-project.net/images/topology.png Security - 04 Cryptology #44 22
Storm Bolt Bolts process input streams and produce new streams Bolts can implement arbitrary logic, e.g., Transformation functions Filters Aggregation functions Joins base access Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple Figure source: http://storm-project.net/images/topology.png Security - 04 Cryptology #45 Storm Topology A topology is a network of spouts and bolts Spouts and bolts are implemented in a parallelizable manner I.e., multiple instances can be spawned The framework executes them in parallel on a Storm cluster as tasks s communicate using message queues Figure source: https://github.com/nathanmarz/storm/wiki/tutorial Security - 04 Cryptology #46 23
Storm Example: Word Count Count words in a continuous stream of data and emit new word count for each word as soon as it changes Topology A spout reads sentences from some data source and emits them One bolt splits sentences into individual words Another bolt keeps track of the number of occurrences of individual words and emits the word and count on change (sentence, - M 3 M 2 M 1 Source (e.g. message queue) The lecturer is the ) sentences split (word, the ), (word, lecturer ), (word, is ), (word, the ) count (the, 1) (lecturer, 1) (is, 1) (the, 2) Figure source: https://github.com/nathanmarz/storm/wiki/tutorial Security - 04 Cryptology #47 Storm Example: Word Count Framework runs multiple instances of spout and bolt implementations in parallel To guarantee non-ambiguous results groupings can be defined Groupings Shuffle randomly picks a task Field makes sure one individual word always goes to the same task - M 3 M 2 M 1 Source (e.g. message queue) sentences Figure source: https://github.com/nathanmarz/storm/wiki/tutorial Shuffle grouping split (word, lecturer ), (word, the ) (word, car ), (word, drives ) Field grouping on word count Security - 04 Cryptology #48 (the, 1) (lecturer, 1) (the, 2) (is, 1) (car, 1) (drives, 1) 24
Storm Stream Groupings Stream groupings define to which task a tuple is sent when it is emitted Examples Shuffle & Field (previous slide) All: replicates to all tasks Global: task with lowest ID None: don t care Direct: emitter defines receiver Security - 04 Cryptology #49 Conclusion Humans create exponentially growing amounts of data Big data challenge Expressing data processing algorithms using functional primitives allows efficient scaling and distribution of computations on large (distributed) data sets Precondition: algorithm must be sufficiently parallelizable New paradigm to design applications Not only for big but also for small data Scale from start-up to global player Security - 04 Cryptology #50 25
Literature [mapreduce] MapReduce: Simplified Processing on Large Clusters, Jeffrey Dean and Sanjay Ghemawat, 2004. http://research.google.com/archive/mapreduce.html [storm] Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. http://stormproject.net [akka] Akka is a toolkit and runtime for building highly concurrent, distributed, and fault tolerant event-driven applications on the JVM. http://akka.io #51 26