Real-time Analytics at Facebook: Data Freeway and Puma. Zheng Shao 12/2/2011

Similar documents
Operations and Big Data: Hadoop, Hive and Scribe. Zheng 铮 9 12/7/2011 Velocity China 2011

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Hadoop Scalability at Facebook. Dmytro Molkov YaC, Moscow, September 19, 2011

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, XLDB Conference at Stanford University, Sept 2012

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012

Impala: A Modern, Open-Source SQL Engine for Hadoop. Marcel Kornacker Cloudera, Inc.

Facebook s Petabyte Scale Data Warehouse using Hive and Hadoop

Hadoop and Hive Development at Facebook. Dhruba Borthakur Zheng Shao {dhruba, Presented at Hadoop World, New York October 2, 2009

Workshop on Hadoop with Big Data

MySQL and Hadoop. Percona Live 2014 Chris Schneider

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS PART 4 BEYOND MAPREDUCE...385

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Speak<geek> Tech Brief. RichRelevance Distributed Computing: creating a scalable, reliable infrastructure

Cloudera Certified Developer for Apache Hadoop

CDH AND BUSINESS CONTINUITY:

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

Apache Hadoop FileSystem and its Usage in Facebook

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Fast Data in the Era of Big Data: Twitter s Real-

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Fast Data in the Era of Big Data: Tiwtter s Real-Time Related Query Suggestion Architecture

HADOOP PERFORMANCE TUNING

Comprehensive Analytics on the Hortonworks Data Platform

Hadoop and Map-Reduce. Swati Gore

Design and Evolution of the Apache Hadoop File System(HDFS)

Analyzing Big Data at. Web 2.0 Expo, 2010 Kevin

Dominik Wagenknecht Accenture

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

MapReduce with Apache Hadoop Analysing Big Data

1.5 Million Log Lines per Second Building and maintaining Flume flows at Conversant

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Mr. Apichon Witayangkurn Department of Civil Engineering The University of Tokyo

Hadoop & Spark Using Amazon EMR

Hadoop Job Oriented Training Agenda

Hadoop: Embracing future hardware

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

NoSQL and Hadoop Technologies On Oracle Cloud

Hadoop for MySQL DBAs. Copyright 2011 Cloudera. All rights reserved. Not to be reproduced without prior written consent.

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Introduction To Hive

Can the Elephants Handle the NoSQL Onslaught?

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Bigtable is a proven design Underpins 100+ Google services:

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Big Data With Hadoop

HBase Schema Design. NoSQL Ma4ers, Cologne, April Lars George Director EMEA Services

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

Hadoop Ecosystem B Y R A H I M A.

Play with Big Data on the Shoulders of Open Source

HADOOP MOCK TEST HADOOP MOCK TEST I

Complete Java Classes Hadoop Syllabus Contact No:

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Scaling Pinterest. Yash Nelapati Ascii Artist. Pinterest Engineering. Saturday, August 31, 13

Hypertable Goes Realtime at Baidu. Yang Dong Sherlock Yang(

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn

Qsoft Inc

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Agenda. Some Examples from Yahoo! Hadoop. Some Examples from Yahoo! Crawling. Cloud (data) management Ahmed Ali-Eldin. First part: Second part:

Real-time Big Data Analytics with Storm

Implement Hadoop jobs to extract business value from large and varied data sets

Apache Hadoop: Past, Present, and Future

Case Study : 3 different hadoop cluster deployments

Dell In-Memory Appliance for Cloudera Enterprise

Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park

<Insert Picture Here> Big Data

Distributed File Systems

The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi,

Hadoop and its Usage at Facebook. Dhruba Borthakur June 22 rd, 2009

Sentimental Analysis using Hadoop Phase 2: Week 2

Streaming items through a cluster with Spark Streaming

Big Data Course Highlights

Cray XC30 Hadoop Platform Jonathan (Bill) Sparks Howard Pritchard Martha Dumler

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Benchmarking Hadoop & HBase on Violin

The Internet of Things and Big Data: Intro

The Hadoop Eco System Shanghai Data Science Meetup

Introduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University

Cloudera Manager Health Checks

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

The Hadoop Distributed File System

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Enabling High performance Big Data platform with RDMA

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

How to Hadoop Without the Worry: Protecting Big Data at Scale

Transcription:

Real-time Analytics at Facebook: Data Freeway and Puma Zheng Shao 12/2/2011

Agenda 1 Analytics and Real-time 2 Data Freeway 3 Puma 4 Future Works

Analytics and Real-time what and why

Facebook Insights Use cases Websites/Ads/Apps/Pages Time series Demographic break-downs Unique counts/heavy hitters Major challenges Scalability Latency

Analytics based on Hadoop/Hive seconds seconds Hourly Copier/Loader Daily Pipeline Jobs HTTP Scribe NFS Hive Hadoop MySQL 3000-node Hadoop cluster Copier/Loader: Map-Reduce hides machine failures Pipeline Jobs: Hive allows SQL-like syntax Good scalability, but poor latency! 24 48 hours.

How to Get Lower Latency? Small-batch Processing Run Map-reduce/Hive every hour, every 15 min, every 5 min, How do we reduce per-batch overhead? Stream Processing Aggregate the data as soon as it arrives How to solve the reliability problem?

Decisions Stream Processing wins! Data Freeway Scalable Data Stream Framework Puma Reliable Stream Aggregation Engine

Data Freeway scalable data stream

Scribe Batch Copier HDFS Scribe Clients Scribe Mid-Tier Scribe Writers Simple push/rpc-based logging system NFS tail/fopen Log Consumer Open-sourced in 2008. 100 log categories at that time. Routing driven by static configuration.

Data Freeway Continuous Copier C1 C2 DataNode Scribe Clients C1 Calligraphus Mid-tier C2 Calligraphus Writers DataNode HDFS PTail HDFS PTail (in the plan) Log Consumer Zookeeper 9GB/sec at peak, 10 sec latency, 2500 log categories

Calligraphus RPC File System Each log category is represented by 1 or more FS directories Each directory is an ordered list of files Bucketing support Application buckets are application-defined shards. Infrastructure buckets allows log streams from x B/s to x GB/s Performance Latency: Call sync every 7 seconds Throughput: Easily saturate 1Gbit NIC

Continuous Copier File System File System Low latency and smooth network usage Deployment Implemented as long-running map-only job Can move to any simple job scheduler Coordination Use lock files on HDFS for now Plan to move to Zookeeper

PTail files checkpoint directory directory directory File System Stream ( RPC ) Reliability Checkpoints inserted into the data stream Can roll back to tail from any data checkpoints No data loss/duplicates

Channel Comparison Push / RPC Pull / FS Latency 1-2 sec 10 sec Loss/Dups Few None Robustness Low High Scribe Complexity Low High Push / RPC PTail + ScribeSend Pull / FS Calligraphus Continuous Copier

Puma real-time aggregation/storage

Overview Log Stream Aggregations Storage ~ 1M log lines per second, but light read Serving Multiple Group-By operations per log line The first key in Group By is always time/date-related Complex aggregations: Unique user count, most frequent elements

MySQL and HBase: one page MySQL HBase Parallel Manual sharding Automatic load balancing Fail-over Manual master/slave switch Read efficiency High Low Write efficiency Medium High Columnar support No Yes Automatic

Puma2 Architecture PTail Puma2 HBase Serving PTail provide parallel data streams For each log line, Puma2 issue increment operations to HBase. Puma2 is symmetric (no sharding). HBase: single increment on multiple columns

Puma2: Pros and Cons Pros Puma2 code is very simple. Puma2 service is very easy to maintain. Cons Increment operation is expensive. Do not support complex aggregations. Hacky implementation of most frequent elements. Can cause small data duplicates.

Improvements in Puma2 Puma2 Batching of requests. Didn t work well because of long-tail distribution. HBase Increment operation optimized by reducing locks. HBase region/hdfs file locality; short-circuited read. Reliability improvements under high load. Still not good enough!

Puma3 Architecture Puma3 is sharded by aggregation key. Each shard is a hashmap in memory. Each entry in hashmap is a pair of an aggregation key and a user-defined aggregation. HBase as persistent key-value storage. PTail Puma3 HBase Serving

Puma3 Architecture PTail Puma3 HBase Write workflow For each log line, extract the columns for key and value. Serving Look up in the hashmap and call user-defined aggregation

Puma3 Architecture PTail Puma3 HBase Checkpoint workflow Every 5 min, save modified hashmap entries, PTail checkpoint to HBase Serving On startup (after node failure), load from HBase Get rid of items in memory once the time window has passed

Puma3 Architecture PTail Puma3 HBase Read workflow Serving Read uncommitted: directly serve from the in-memory hashmap; load from Hbase on miss. Read committed: read from HBase and serve.

Puma3 Architecture PTail Puma3 HBase Join Static join table in HBase. Distributed hash lookup in user-defined function (udf). Serving Local cache improves the throughput of the udf a lot.

Puma2 / Puma3 comparison Puma3 is much better in write throughput Use 25% of the boxes to handle the same load. HBase is really good at write throughput. Puma3 needs a lot of memory Use 60GB of memory per box for the hashmap SSD can scale to 10x per box.

Puma3 Special Aggregations Unique Counts Calculation Adaptive sampling Bloom filter (in the plan) Most frequent item (in the plan) Lossy counting Probabilistic lossy counting

PQL Puma Query Language CREATE INPUT TABLE t ( time', adid, userid ); CREATE VIEW v AS SELECT *, udf.age(userid) FROM t WHERE udf.age(userid) > 21 CREATE HBASE TABLE h CREATE LOGICAL TABLE l CREATE AGGREGATION abc INSERT INTO l (a, b, c) SELECT udf.hour(time), adid, age, count(1), udf.count_distinc(userid) FROM v GROUP BY udf.hour(time), adid, age;

Future Works challenges and opportunities

Future Works Scheduler Support Just need simple scheduling because the work load is continuous Mass adoption Migrate most daily reporting queries from Hive Open Source Biggest bottleneck: Java Thrift dependency Will come one by one

Similar Systems STREAM from Stanford Flume from Cloudera S4 from Yahoo Rainbird/Storm from Twitter Kafka from Linkedin

Key differences Scalable Data Streams 9 GB/sec with < 10 sec of latency Both Push/RPC-based and Pull/File System-based Components to support arbitrary combination of channels Reliable Stream Aggregations Good support for Time-based Group By, Table-Stream Lookup Join Query Language: Puma : Realtime-MR = Hive : MR No support for sliding window, stream joins

(c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0