1.5 Million Log Lines per Second Building and maintaining Flume flows at Conversant



Similar documents
Headline Goes Here Hari Shreedharan Speaker Name or Subhead Goes Here

Real-time Analytics at Facebook: Data Freeway and Puma. Zheng Shao 12/2/2011

Chase Wu New Jersey Ins0tute of Technology

The Hadoop Eco System Shanghai Data Science Meetup

Testing Big data is one of the biggest

Comprehensive Analytics on the Hortonworks Data Platform

Openbus Documentation

DeltaV Virtualization High Availability and Disaster Recovery

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)

WHITE PAPER. Reference Guide for Deploying and Configuring Apache Kafka

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS

Hadoop: The Definitive Guide

Kafka & Redis for Big Data Solutions

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

HBase Schema Design. NoSQL Ma4ers, Cologne, April Lars George Director EMEA Services

A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle

Perforce Backup Strategy & Disaster Recovery at National Instruments

Networking in the Hadoop Cluster

Oracle Database 10g: Backup and Recovery 1-2

Dominik Wagenknecht Accenture

Designing, Optimizing and Maintaining a Database Administrative Solution for Microsoft SQL Server 2008

Trafodion Operational SQL-on-Hadoop

Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010

E6893 Big Data Analytics Lecture 2: Big Data Analytics Platforms

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

Testing 3Vs (Volume, Variety and Velocity) of Big Data

CNS-208 Citrix NetScaler 10.5 Essentials for ACE Migration

Citrix NetScaler 10 Essentials and Networking

XpoLog Competitive Comparison Sheet

Apache Hadoop FileSystem and its Usage in Facebook

Sentimental Analysis using Hadoop Phase 2: Week 2

MS Design, Optimize and Maintain Database for Microsoft SQL Server 2008

COURSE CONTENT Big Data and Hadoop Training

Virtualizing Apache Hadoop. June, 2012

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS PART 4 BEYOND MAPREDUCE...385

Citrix NetScaler 10.5 Essentials for ACE Migration CNS208; 5 Days, Instructor-led

Volume Replication INSTALATION GUIDE. Open-E Data Storage Server (DSS )

Transforming the Telecoms Business using Big Data and Analytics

Best Practices for Hadoop Data Analysis with Tableau

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Data Challenges in Telecommunications Networks and a Big Data Solution

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

CS 378 Big Data Programming. Lecture 2 Map- Reduce

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Cloudera Manager Health Checks

Data Pipeline with Kafka

CNS-205 Citrix NetScaler 10 Essentials and Networking

Top 10 Reasons why MySQL Experts Switch to SchoonerSQL - Solving the common problems users face with MySQL

Hadoop IST 734 SS CHUNG

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF

An Approach to Implement Map Reduce with NoSQL Databases

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Big Data? Definition # 1: Big Data Definition Forrester Research

Guideline for stresstest Page 1 of 6. Stress test

America s Most Wanted a metric to detect persistently faulty machines in Hadoop

Design and Evolution of the Apache Hadoop File System(HDFS)

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

CS 378 Big Data Programming

Distributed Computing and Big Data: Hadoop and MapReduce

Certified Big Data and Apache Hadoop Developer VS-1221

How To Manage A Netscaler On A Pc Or Mac Or Mac With A Net Scaler On An Ipad Or Ipad With A Goslade On A Ggoslode On A Laptop Or Ipa On A Network With

Cost-Effective Business Intelligence with Red Hat and Open Source

Apache Hadoop: Past, Present, and Future

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

How To Fix A Powerline From Disaster To Powerline

Why Test ITSM Applications for Performance? Webinar

Maximum Availability Architecture. Oracle Best Practices For High Availability. Backup and Recovery Scenarios for Oracle WebLogic Server: 10.

The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi,

An Oracle White Paper May Oracle Database Cloud Service

Netezza PureData System Administration Course

Business Application Services Testing

RMAN What is Rman Why use Rman Understanding The Rman Architecture Taking Backup in Non archive Backup Mode Taking Backup in archive Mode

Move Data from Oracle to Hadoop and Gain New Business Insights

VirtualCenter Database Maintenance VirtualCenter 2.0.x and Microsoft SQL Server

Running a Workflow on a PowerCenter Grid

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

A framework for easy development of Big Data applications

ESB Features Comparison

CNS-208 CITRIX NETSCALER 10.5 ESSENTIALS FOR ACE MIGRATION

Integrating QRadar with Hadoop A White Paper

Comparing SQL and NOSQL databases

Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp

The Hadoop Distributed File System

Disaster Recovery Disaster Recovery Planning for Business Continuity Session Name :

Oracle 11g: RAC and Grid Infrastructure Administration Accelerated R2

Neverfail for Windows Applications June 2010

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

Server Consolidation with SQL Server 2008

Windows Server 2008 R2 Hyper-V Live Migration

Enhanced Connector Applications SupportPac VP01 for IBM WebSphere Business Events 3.0.0

Protect SAP HANA Based on SUSE Linux Enterprise Server with SEP sesam

MySQL and Hadoop. Percona Live 2014 Chris Schneider

Engine: Using MSBuild and Team Foundation

Transcription:

1.5 Million Log Lines per Second Building and maintaining Flume flows at Conversant Big Data Everywhere Chicago 2014 Mike Keane mkeane@conversant.com

SLA for Event Driven Logging R with Flume Quicker insight into production data Reduce complexity of administering/managing new servers, data centers, etc. Scalable No data loss or duplication Replace TSV files with Avro objects Able to be monitored by Network Operations Center (NOC) Able to recover from downtime quickly

Simplistic Flume Overview R A Flume Flow is a series of flume agents data follows from origination to final destination Data on a Flume Flow is packaged in FlumeEvent Avro objects A FlumeEvent is composed of Headers A map of string value pairs Body A byte array A FlumeEvent is an atomic unit of data FlumeEvents are sent in batches When a batch of FlumeEvents only partially makes it to the next flume agent in the flow, the entire batch is resent resulting in duplicates

Simplistic Flume Overview R Flume Agent

Simplistic Flume Overview R EmbeddedAgent Compressor Agent Landing Agent

Overview of existing network topology 3 data centers divided into 12 lanes participating in the OpenRTB market 6 lanes in the east coast data center 4 lanes in the west coast data center 2 lanes in the European data center Each lane has approximately 75 servers handling OpenRTB operations. 30 different logs Over 60,000,000,000 log lines per day

Overview of existing network topology.

P.O.C. Can Flume handle our log volume reliably? 2 Server Flume Flow from East Coast (IAD) to Chicago (ORD) with over 250K TSV lines per second No Data Loss Failover Compression performance

P.O.C. Overview

P.O.C. passes Larger Batch sizes helped, but could not reach 250K per second Multiple TSV lines Per FlumeEvent hits over 360K per second Failover passed with duplicates Compression passed but needed to parallelize 7X sinks

Taking Flume to Production Embedding the EmbeddedAgent in existing servers Modify EmbeddedAgent Properties from existing infrastructure Implement Monitoring Create Flume Implementation of proprietary logging interface Replace POJO to TSV with Avro to AvroDataFile Preventing duplicates, not removing Add LogType header

Taking Flume to Production Custom Sink for AvroDataFile body (based on HDFSEventSink) Check if UUID header is in HBase Yes increment duplicate count metric No Write AvroDataFile body to HDFS using Custom Writer Put UUID to HBase

Taking Flume to Production Custom Selector based on MultiplexingChannelSelector Route FlumeEvents to channels by log type or groups of log types Bifurcate to multiple locations each log and each location with its own percentage of data to bifurcate

Configuring Flume Flows Configuring Flume can be tedious, use a templating engine In Q2 2014 Conversant expanded from 7 lanes in 2 data centers to 12 lanes in 3 data centers (~400 more servers to configure). Static headers useful for tracking flows 15 minutes to configure all Q2 expansion CompressorLane('iad6', [CompressorAgent("dtiad06flm01p"), CompressorAgent("dtiad06flm02p"), CompressorAgent("dtiad06flm03p")]) compressor.list = dtiad06flm01p, dtiad06flm02p,dtiad06flm03p

Monitoring the Flume Flows Flume metrics are available by JMX or Json over HTTP Metrics to monitor ChannelFillPercentage Rate of change on EventDrainSuccessCount on failover sinks FLUME-2307 File channel deletes fail after timeout (fixed 1.5) Publishing metrics to TSDB provides great visual insight

Monitoring the Flume Flows ChannelFillPercentage

Monitoring the Flume Flows Rate of taking events off Critical Logs file channel

Monitoring the Flume Flows Rate of Flume Events by data center East Coast, West Coast, Europe

Monitoring the Flume Flows Monitoring by Groups

Benefits of migrating to Flume Business has insight into data in under 10 minutes Configuring expansion trivial Failover enables automatic recovery from down time Bifurcation enables scaled constant regression lane(s) Subset of data to analytics development cluster

Benefits of migrating to Flume 5 minute aggregations to business within 10 minutes

Gotchas Scaling for Compression Auto reloading of properties inconsistent It is recommended (though not required) to use a separate disk for the File Channel checkpoint. RAID-6 raid array, Force Write Back Bad configurations not easy to see, not always clear in log file. NetcatSource Not too useful beyond trivial usage

Gotchas POM file edits JUnits are not deterministic Hadoop jars added to classpath by startup script IDE Avoiding cost of Avro schema evolution

What is next Upgrade to Flume 1.5 Bifurcate to micro batch (Storm? Spark?) Disable sink switch