Monitoring Hadoop with Akka. Clint Combs Senior Software Engineer at Collective



Similar documents
Above the Clouds: Introducing Akka

Scaling Up & Out with Actors: Introducing Akka

Scala Actors Library. Robert Hilbrich

Reactive Applications: What if Your Internet of Things has 1000s of Things?

Big Data Analysis using Distributed Actors Framework

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

Hadoop Ecosystem B Y R A H I M A.

Wisdom from Crowds of Machines

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

6. How MapReduce Works. Jari-Pekka Voutilainen

Understanding Server Configuration Parameters and Their Effect on Server Statistics

ADAM 5.5. System Requirements

Hadoop. History and Introduction. Explained By Vaibhav Agarwal

Scaling Out With Apache Spark. DTL Meeting Slides based on

Glassfish Architecture.

Rakam: Distributed Analytics API

Running a Workflow on a PowerCenter Grid

A Distributed Sentiment Analysis Development Environment

ArcGIS for Server: Administrative Scripting and Automation

Chapter 1 - Web Server Management and Cluster Topology

Oracle WebLogic Server 11g Administration

Code:1Z Titre: Oracle WebLogic. Version: Demo. Server 12c Essentials.

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Responsive, resilient, elastic and message driven system

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000

IBM WebSphere Distributed Caching Products

Trend Micro Big Data Platform and Apache Bigtop. 葉 祐 欣 (Evans Ye) Big Data Conference 2015

Developing Windows Azure and Web Services

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

Monitoring IBM WebSphere extreme Scale (WXS) Calls With dynatrace

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Exploring Oracle E-Business Suite Load Balancing Options. Venkat Perumal IT Convergence

Hadoop Parallel Data Processing

SwiftScale: Technical Approach Document

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Application Servers - BEA WebLogic. Installing the Application Server

In-Memory BigData. Summer 2012, Technology Overview

Middleware Lou Somers

CSE-E5430 Scalable Cloud Computing Lecture 11

Apache Flink Next-gen data analysis. Kostas

Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II)

WebSphere Performance Monitoring & Tuning For Webtop Version 5.3 on WebSphere 5.1.x

Ikasan ESB Reference Architecture Review

The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi,

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

Listeners. Formats. Free Form. Formatted

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Akka Streams. Dr. Roland Typesafe

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

KillTest. 半 年 免 费 更 新 服 务

CLOUD STORAGE USING HADOOP AND PLAY

CIEL A universal execution engine for distributed data-flow computing

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Building Hyper-Scale Platform-as-a-Service Microservices with Microsoft Azure. Patriek van Dorp and Alex Thissen

Resource Aware Scheduler for Storm. Software Design Document. Date: 09/18/2015

Configuring an Oracle Business Intelligence Enterprise Edition Resource in Metadata Manager

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April Page 1 of 12

Benchmark Study on Distributed XML Filtering Using Hadoop Distribution Environment. Sanjay Kulhari, Jian Wen UC Riverside

Apache Traffic Server Extensible Host Resolution

LOAD BALANCING TECHNIQUES FOR RELEASE 11i AND RELEASE 12 E-BUSINESS ENVIRONMENTS

Server Monitoring. AppDynamics Pro Documentation. Version Page 1

Quantifind s story: Building custom interactive data analytics infrastructure

Comparison of the High Availability and Grid Options

A Case Based Tool for Monitoring of Web Services Behaviors

Tool - 1: Health Center

Chapter 6, The Operating System Machine Level

JBoss Seam Performance and Scalability on Dell PowerEdge 1855 Blade Servers

Shark Installation Guide Week 3 Report. Ankush Arora

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

Apache and Tomcat Clustering Configuration Table of Contents

Solution White Paper Connect Hadoop to the Enterprise

Performance Analysis of Lucene Index on HBase Environment

Akka Stream and HTTP Experimental Java Documentation

AquaLogic ESB Design and Integration (3 Days)

WebLogic Server Admin

Extending Hadoop beyond MapReduce

FioranoMQ 9. High Availability Guide

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet

Enterprise Manager Performance Tips

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

WASv6_Scheduler.ppt Page 1 of 18

Zing Vision. Answering your toughest production Java performance questions

A technical guide for monitoring Adobe LiveCycle ES deployments

How To Use Cloudera Manager Backup And Disaster Recovery (Brd) On A Microsoft Hadoop (Clouderma) On An Ubuntu Or 5.3.5

Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1

Apache Flink. Fast and Reliable Large-Scale Data Processing

CDAT Overview. Remote Managing and Monitoring of SESM Applications. Remote Managing CHAPTER

Big Data Analysis: Apache Storm Perspective

Eclipse 4 RCP application Development COURSE OUTLINE

J2EE-JAVA SYSTEM MONITORING (Wily introscope)

Agenda. Some Examples from Yahoo! Hadoop. Some Examples from Yahoo! Crawling. Cloud (data) management Ahmed Ali-Eldin. First part: Second part:

Application Centric Infrastructure Object-Oriented Data Model: Gain Advanced Network Control and Programmability

A distributed system is defined as

Transcription:

Monitoring Hadoop with Akka Clint Combs Senior Software Engineer at Collective

Collective - The Audience Engine Ad Technology Company Heavy Investment in Hadoop and Other Scalable Infrastructure Need to Monitor Hadoop Ecosystem Collect metrics with Riemann, Graphite, and Grafana

Hadoop Ecosystem Yarn REST API Provides Metrics for Cluster and Jobs Oozie - Workflow Scheduler Celos - Partial replacement for Oozie Ecosystem produces metrics Metrics processed as Messages by HadoopMetrics which is built with Akka

Akka is a toolkit and runtime for building highly concurrent, distributed, and fault tolerant eventdriven applications on the JVM. http://akka.io/

Akka Platform Runs on the JVM Closely tied to the Scala language and runtime Provides a Java API, including experimental Java 8 lambda support

Akka Background Created by Jonas Bonér Typesafe Co-founder with Martin Odersky, the creator of Scala and author of javac for JDK 1.3 Based on ideas Jonas learned from Erlang Actor Model originated with Carl Hewitt in 1973 paper, A Universal Modular Actor Formalism for Artificial Intelligence.

Akka Features Actors - simple high-level concurrency abstraction Fault Tolerance - supervisor hierarchy spans 1+ JVMs Location Transparency - distributed asynchronous message passing Persistence - messages optionally persisted and replayed on actor restart

Actors Lightweight (~300 bytes per instance) Receive asynchronous messages via a mailbox Process messages one-at-a-time in the order they are received Maintain and should not directly share their own state

Messages Sent to Actors from other Actors Sometimes Sent from Non-Actors Immutable Built-in types Scala case classes are ideal

Case Classes as Messages Immutable by default Generated equals and tostring with constructor parameters Can construct without new via companion object Pattern matching

Creating Actors Don t use new by itself Use Props - immutable (shareable) recipe for creating an actor ActorRef - used to send messages to actors Actor DSL

Props

Receiving Messages Every actor has a receive function of type Receive Receive function processes messages from mailbox

Reply to Messages Upon receiving a message an actor will often reply sender()! replymsg tell, don t ask

Message Delivery Guarantees At-most-once Ordered per sender/receiver pair

Actor Systems One per application Manages resources and actor hierarchy Provide a default dispatcher

ExecutionContext Equivalent of java.util.concurrent.executor Manages threads which execute Runnable and Future Provides default dispatcher

MessageDispatcher An extension of ExecutionContext The engine that delivers messages to actors Default dispatcher is often replaced to scale application

Performance 2.5 million actors per GB of heap (~300 bytes/actor) 50 million messages per second on a single machine Typesafe has tested a 2400 Node Akka Cluster on Google Compute Engine

Actor Hierarchy and Supervision Actors form a tree or Supervision Hierarchy Parents are notified of child failure and child restarted Failure is localized to a sub-branch or propagated up Routers act as supervisors (default to escalate)

Event Bus Publish and Subscribe mechanism for messages Sender not preserved in message Actors subscribe and handle messages normally

Paths Actor path can be used to look up an ActorRef ActorRef is then used to send messages Paths may select multiple actors

Messaging Actors from Non-Actors Inbox the ask pattern spawns temporary actor to handle reply reply is handled with a future Await

The Scheduler and Cancellable Schedules recurring or one-time messages to actors recurring or one-time execution with futures Returned values are Cancellable

Shutdown Difficult problem in async systems Roll-your-own PoisonPill Graceful Stop

ShutdownActor

Other Akka Topics Testing framework closely integrated with ScalaTest Clustering Spray.io Fully asynchronous HTTP becoming part of Akka Used in this system for status and control

Results Job workflows are tracked through lifecycle: Celos - workflows and metrics are configurable Oozie - associated with Celos workflow slots Hadoop - Jobs associated with Oozie workflow IDs Metrics are published to Grafana via Riemann

GrandCentral Export Profiles

Hadoop Cluster

Celos Client Response

Aerospike Import