BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011

Similar documents
Design and Evolution of the Apache Hadoop File System(HDFS)

BookKeeper overview. Table of contents

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Hadoop Architecture. Part 1

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Hadoop: Embracing future hardware

HADOOP MOCK TEST HADOOP MOCK TEST I

Apache Hadoop FileSystem and its Usage in Facebook

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Hadoop Distributed File System. Dhruba Borthakur June, 2007

CDH AND BUSINESS CONTINUITY:

THE HADOOP DISTRIBUTED FILE SYSTEM

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

The Hadoop Distributed File System

Snapshots in Hadoop Distributed File System

CSE-E5430 Scalable Cloud Computing Lecture 2

Distributed File Systems

The Hadoop Distributed File System

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee June 3 rd, 2008

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Hadoop Scalability at Facebook. Dmytro Molkov YaC, Moscow, September 19, 2011

ZooKeeper. Table of contents

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Hadoop IST 734 SS CHUNG

HDFS Under the Hood. Sanjay Radia. Grid Computing, Hadoop Yahoo Inc.

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Hadoop & its Usage at Facebook

Hadoop Distributed File System (HDFS) Overview

HDFS Users Guide. Table of contents

BIG DATA TRENDS AND TECHNOLOGIES

Hadoop & its Usage at Facebook

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Apache Hadoop FileSystem Internals

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Distributed Filesystems

NoSQL Data Base Basics

Intro to Map/Reduce a.k.a. Hadoop

Big Data Technology Core Hadoop: HDFS-YARN Internals

Sujee Maniyam, ElephantScale

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Distributed File Systems

EXPERIMENTATION. HARRISON CARRANZA School of Computer Science and Mathematics

Application Development. A Paradigm Shift

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

IJFEAT INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY

Online data processing with S4 and Omid*

HDFS Architecture Guide

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Deploying Hadoop with Manager

Apache Hadoop. Alexandru Costan

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Distributed Storage Systems

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

WHITE PAPER. Reference Guide for Deploying and Configuring Apache Kafka

GraySort and MinuteSort at Yahoo on Hadoop 0.23

Sheepdog: distributed storage system for QEMU

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Introduction to Cloud Computing

Benchmarking Hadoop & HBase on Violin

Shared Parallel File System

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Hadoop and its Usage at Facebook. Dhruba Borthakur June 22 rd, 2009

Michael Thomas, Dorian Kcira California Institute of Technology. CMS Offline & Computing Week

Apache Hadoop: Past, Present, and Future

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

How To Scale Out Of A Nosql Database

Maxta Storage Platform Enterprise Storage Re-defined

MapReduce and Hadoop. Aaron Birkland Cornell Center for Advanced Computing. January 2012

Maginatics Cloud Storage Platform for Elastic NAS Workloads

<Insert Picture Here> Big Data

BlobSeer: Towards efficient data storage management on large-scale, distributed systems

Trends in Enterprise Backup Deduplication

Hadoop implementation of MapReduce computational model. Ján Vaňo

Accelerating and Simplifying Apache

HDFS Design Principles

MinCopysets: Derandomizing Replication In Cloud Storage

HDFS: Hadoop Distributed File System

High Availability with Windows Server 2012 Release Candidate

!"#$%&' ( )%#*'+,'-#.//"0( !"#$"%&'()*$+()',!-+.'/', 4(5,67,!-+!"89,:*$;'0+$.<.,&0$'09,&)"/=+,!()<>'0, 3, Processing LARGE data sets

Certified Big Data and Apache Hadoop Developer VS-1221

Alternatives to HIVE SQL in Hadoop File Structure

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Transcription:

BookKeeper Flavio Junqueira Yahoo! Research, Barcelona Hadoop in China 2011

What s BookKeeper? Shared storage for writing fast sequences of byte arrays Data is replicated Writes are striped Many processes can access it 2

Motivation Recoverable systems Journal/write-ahead log Integrity and durability Efficient: sequential synchronous writes Why is writing sequentially important? To avoid random seeks 3

More motivation Examples Many databases (e.g., Postgres) Hbase region server ZooKeeper HDFS namenode Hedwig hubs 4

HDFS at a glance Main components: namenode and datanode Single name node A number of data nodes Namenode Manages FS namespace Regulates access to the FS Mapping of blocks to data nodes Datanode Stores blocks http://hadoop.apache.org/common/docs/current/hdfs_design.html Serves reads and writes 5

Namenode File system state Metadata, block map In memory Checkpoint On disk Snapshot of the service state Edit log BookKeeper kicks in! Persists changes to the file system metadata Written to disk 6

Namenode Edit log is a journal Local disk NFS server Production use Enterprise-class NFS Expensive devices E.g., Netapp Filer BookKeeper kicks in! Robust, but still a single point of failure 7

Making the namenode highly available Backup node One step ahead Receives a stream of updates Warm standby Shortcomings Cannot guarantee consistency Difficult to have multiple backups 8

Making the namenode highly available 1) primary nn1 standby nn1 standby nn k Communication among processes to coordinate 2) primary nn1 standby nn1 standby nn k Shared storage Communication with shared data store 9

Making the namenode highly available Replicate the functionality of the name node Performance penalty Not scalable Write log to external device NFS Avatarnode Replication is not transparent External high-performance logging/journaling service BookKeeper 10

BookKeeper Shared storage for logs Design goals Efficient sequential writes Fault tolerance Scalability 11

BookKeeeper architecture Bookie: Storage node Ledger: log file Ensemble: group of bookies storing a ledger Writes to quorums of Bookies Parallel writes to quorums Reads from the same quorum 12

The anatomy of a bookie Transaction log Pre-allocates, batches Return upon write/sync to disk Bookie Index Position of entry Entries Written sequentially to entry log Transaction log device synchronous sequential writes + sequential reads Ledger device asynchronous sequential writes + random reads 13

Scalability of writes Write quorums do not necessarily intersect Assuming that: 1. Each bookie performs e entries/s 2. Number of bookies: r 3. Write quorum: q bookies r " e Ideal maximum throughput: q In practice, network bandwidth or cpu limits the total capacity in bytes written per second! 14

API at a glance createledger openledger addentry Async and sync readentries Async and sync closeledger Writes the last entry id to ZooKeeper 15

Why keep last entry id? Acknowledgement Ledger closed properly Agreement Two readers don t read different sets of entries What if no last entry id has been written? 16

Recovery procedure Reader client executes a ledger recovery procedure Hints on ledger entries Procedure Request last entry hint from bookies Try to read as many entries greater than the hint Make sure entries are written to a quorum 17

How to use it Application writer Creates a ledger Add entries to the ledger Return upon confirmation from quorum Closes the ledger Application readers Open ledger Read from the ledger Application does not reopen to append 18

BookKeeper service Service Bookies in the cloud Through ZooKeeper ZooKeeper Bookies online Ledger metadata 19

Performance

Setup Cluster of identical machines 2 Quad Core Intel Xeon 2.5GHz 16GB of RAM Four SATA disks, 7,200 RPMs 1Gbit/s network interface 21

BookKeeper performance 70 60 Single writer Latency in ms 50 40 30 20 10 0 4k 2k 1k 128 0 10 20 30 40 50 60 70 Throughput (1000 x adds/s) 22

BookKeeper performance Multi-writer Aggregate throughput bytes 2Q 3Q 3E 6E 3E 6E 128 87k 116K 57k 108k Concurrent ledgers Up to 40k ledgers 1024 31k 54k 20k 38k 4096 8k 16k 5k 11k add operations/s 23

BookKeeper and the Namenode 50 40 BookKeeper Local File NFS NoPersist Latency in ms 30 20 10 0 0 1 2 3 4 5 6 7 8 9 10 Throughput (1000 x txns/s) 24

Hedwig

Hedwig Multi-region pub/sub system Guaranteed-delivery topic-based pub-sub system Extremely High Performance Elastically scalable Deployed over commodity machines Capacity can be added on-the-fly by adding machines Low Operational Complexity Tolerate failures without manual intervention Automatic load balancing Designed for multiple data-centers 26

Hedwig overview ()*' /--0$$+$,' 2)*2.,#+3-%24'5-6' 7).5'&5$8'59:$'.-%2)7$;4'$&.<''!"#$%&'!"#$%&'!"#$%&' +,-&-.-"' ()*' 1--00$$+$,' +)*"#25$;'7$229>$2' ()*' +,-&-.-"' =%&$,%$&' ($;6#>'#%2&9%.$2' #%'-&5$,';9&9'.$%&$,2' 27

Wrap up

Advanced features Opening without recovery Warm standbys Must know what you re doing Fencing Consistency despite concurrent accesses Prevents new sucessful writes once recovered 29

Status Release on the way Candidate should be out this week BookKeeper and the namenode Watch HDFS-1580 and HDFS-234 30

The team Dhruba Borthakur (Facebook) Flavio Junqueira (Yahoo!) Ivan Kelly (Yahoo!) Benjamin Reed (Yahoo!) Utkarsh Srivastava (Twitter) 31

Contributing Sign up for the lists Discuss with the community Propose improvements Bug fixes New features http://zookeeper.apache.org/bookkeeper 32

Questions? http://zookeeper.apache.org/bookkeeper