Apache HBase 0.96 and What s Next Headline Goes Here Jonathan Hsieh @jmhsieh

Similar documents

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Apache HBase. Crazy dances on the elephant back

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF

Hadoop Ecosystem B Y R A H I M A.

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

Hadoop IST 734 SS CHUNG

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

HBase Schema Design. NoSQL Ma4ers, Cologne, April Lars George Director EMEA Services

Apache Sentry. Prasad Mujumdar

Apache Hadoop: Past, Present, and Future

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Integration of Apache Hive and HBase

Design and Evolution of the Apache Hadoop File System(HDFS)

How To Scale Out Of A Nosql Database

NoSQL and Hadoop Technologies On Oracle Cloud

Constructing a Data Lake: Hadoop and Oracle Database United!

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Storage of Structured Data: BigTable and HBase. New Trends In Distributed Systems MSc Software and Systems

Comparing SQL and NOSQL databases

Deploying Hadoop with Manager

Cloudera Certified Developer for Apache Hadoop

The Hadoop Distributed File System

Cloud Computing at Google. Architecture

Hypertable Goes Realtime at Baidu. Yang Dong Sherlock Yang(

Hadoop and Map-Reduce. Swati Gore

Practical Cassandra. Vitalii

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Apache HBase: the Hadoop Database

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Impala: A Modern, Open-Source SQL Engine for Hadoop. Marcel Kornacker Cloudera, Inc.

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Open source large scale distributed data management with Google s MapReduce and Bigtable

HDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1

Trafodion Operational SQL-on-Hadoop

Chapter 7. Using Hadoop Cluster and MapReduce

A Scalable Data Transformation Framework using the Hadoop Ecosystem

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

Introduction to Hbase Gkavresis Giorgos 1470

Xiaoming Gao Hui Li Thilina Gunarathne

Big Data Course Highlights

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Hadoop & Spark Using Amazon EMR

Hadoop Scalability at Facebook. Dmytro Molkov YaC, Moscow, September 19, 2011

The Hadoop Eco System Shanghai Data Science Meetup

Hypertable Architecture Overview

Cloudera Impala: A Modern SQL Engine for Hadoop Headline Goes Here

How to Choose Between Hadoop, NoSQL and RDBMS

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect

Apache Kylin Introduction Dec 8,

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Bigdata High Availability (HA) Architecture

CitusDB Architecture for Real-Time Big Data

Big Data With Hadoop

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Distributed File Systems

<Insert Picture Here> Big Data

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Complete Java Classes Hadoop Syllabus Contact No:

Schema Design Patterns for a Peta-Scale World. Aaron Kimball Chief Architect, WibiData

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Hadoop: Embracing future hardware

CDH AND BUSINESS CONTINUITY:

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

In Memory Accelerator for MongoDB

White Paper: What You Need To Know About Hadoop

BIG DATA HADOOP TRAINING

Sujee Maniyam, ElephantScale

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

BIG DATA What it is and how to use?

HDFS 2015: Past, Present, and Future

Finding the Needle in a Big Data Haystack. Wolfgang Hoschek (@whoschek) JAX 2014

BIG DATA TRENDS AND TECHNOLOGIES

Best Practices for Hadoop Data Analysis with Tableau

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012

A Brief Outline on Bigdata Hadoop

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Data Management in the Cloud

THE HADOOP DISTRIBUTED FILE SYSTEM

Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. project.org. University of California, Berkeley UC BERKELEY

Using RDBMS, NoSQL or Hadoop?

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Peers Techno log ies Pv t. L td. HADOOP

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

Apache Hadoop FileSystem and its Usage in Facebook

Hadoop Distributed File System (HDFS) Overview

Transcription:

Apache HBase 0.96 and What s Next Headline Goes Here @jmhsieh Speaker Name or Subhead Goes Here Software Engineer at Cloudera HBase PMC Member October 29, 2013 DO NOT USE PUBLICLY PRIOR TO 10/23/12

Who Am I? Cloudera: Software Engineer Tech Lead HBase Team Apache HBase committer / PMC Apache Flume founder / PMC U of Washington: Research in Distributed Systems 2

What is Apache HBase? App ZK MR HDFS Apache HBase is an open source, distributed, consistent, non-relational database that provides low-latency, random read/write operations. 3

Developer Community Vibrant, Highly Active community! We re Growing! 4

Today: Apache HBase 0.96.0 Reliable Highly Available Predictability HBase is fault tolerant HDFS for durability via replication HBase has good availability Uses ZK for coordination via quorums New features for fast RS recovery in 0.96 HBase strives for predictable performance Some experimental features in 0.96, and more improvements in development. 5

Summary by Version New Features MTTR 0.90 (CDH3) 0.92 /0.94 (CDH4) 0.96 (CDH5) Next Stability Reliability Continuity Multitenancy Recovery in Hours Recovery in Minutes Recovery of writes in seconds, reads in 10 s of seconds Perf Baseline Better Throughput Optimizing Performance Recovery in Seconds (reads+writes) Predictable Performance Usability HBase Developer Expertise HBase Operational Experience Distributed Systems Admin Experience Application Developers Experience 6

MTTR Recap Mean time to recovery 7

Mean Time to Recovery (MTTR) Region unavailable Region available client unaware Region available client aware detect repair notify recovered Machine failures happen in distributed systems Average unavailability when automatically recovering from a failure. Recovery time for a unclean data center power cycle 8

Distributed log splitting (0.92) Region unavailable Region available for RW detect split assign replay recovered hdfs hdfs hdfs Repair == split, assign, replay When many servers fail, many logs to recover Instead of just at the master, RS s split logs in parallel Huge win in recovery time 9

Fast notification and detection (0.96) Region unavailable Region available for RW detect split assign replay recovered hdfs hdfs hdfs Proactive notification of HMaster failure (0.96) Proactive notification of RS failure (0.96) Notify client on recovery (0.96) Fast server failover (Hardware) 10

Distributed log replay (0.96) Region unavailable Region available for replay writes Region available for RW detect assign split + replay recovered hdfs 11 Previously had two IO intensive passes: Log splitting to intermediate files Assign and log replay Now just one IO heavy pass: Assign first, then split+replay. Improves read and write recovery times. Off by default currently*. *Caveat: If you override time stamps you could have READ REPEATED isolation violations (use tags to fix this)

Distributed log replay with fast write recovery Region unavailable Region available for all writes Region available for RW detect assign split + replay recovered hdfs 12 Writes in HBase do not incur reads. With distributed log replay, we ve already have regions open for write. Allow fresh writes while replaying old logs*. *Caveat: If you override time stamps you could have READ REPEATED isolation violations (use tags to fix this)

Fast Read Recovery detect Region unavailable assign Can guarantee no new edits? Region available for all RW recovered Can guarantee we have all edits? Region available for all RW Idea: Pristine Region fast read recovery If region not edited it is consistent and can recover RW immediately Idea: Shadow Regions for fast read recovery Shadow region tails the WAL of the primary region Shadow memstore is one HDFS block behind, catch up recover RW Currently some progress for trunk 13

Multi-tenancy Many apps and users in a single cluster 14

# of clusters Scalability Growing HBase Pre 0.96.0: scaling up HBase for single HBase applications Essentially a single user for single app. Ex: Facebook messages, one application, many hbase clusters Shard users to different pods Focused on continuity and disaster recovery features Cross-cluster Replication Table Snapshots Rolling Upgrades One giant application, Multiple clusters # of isolated applications 15

# of clusters Scalability Growing HBase In 0.96 we introduce primitives for supporting Multitenancy Many users, many applications, one HBase cluster Need to have some control of the interactions different users cause. Ex: Manage for MR analytics and low-latency serving in one cluster. One giant application, Multiple clusters Many applications In one shared cluster multitenancy # of isolated applications 16

Namespaces (0.96) Namespaces provide an abstraction for multiple tenants to create and manage their own tables within a large HBase instance. Namespace blue Namespace green Namespace orange 17

Multitenancy goals Security (0.96) A separate admin ACLs for different sets of tables Quotas (in progress) Max tables, max regions. Performance Isolation (in progress) Limit performance impact load on one table has on others. Priority (future) Handle some tables before others 18

Isolation with Region Server Groups Namespace blue Namespace green Namespace orange Region assignment distribution (no region server groups) 19

Isolation with Region Server Groups Namespace blue Namespace green Namespace orange Region assignment distribution with Region Server Groups (RSG) RSG blue RSG green orange 20

Cell Tags Mechanism for attaching arbitrary metadata to Cells. Motivation: Finer-grained isolation Use for Accumulo-style cell-level visibility Main feature for 0.98 (in development). Other uses: Add sequence numbers to enable correct fast read/write recovery Potential for schema tags 21

Improving Predictability Improving the 99%tile 22

Common causes of performance variability Locality Loss Favored Nodes, HDFS block affinity Compaction* GC Exploring compactor Off-heap Cache Hardware hiccups Multi WAL, HDFS speculative read *See my Hadoop Summit 2013 talk 23

Performance degraded after recovery recovery Service recovered; degraded performance recovered Performance recovered because compaction restores locality performance recovered After recovery, reads suffer a performance hit. Regions have lost locality To maintain performance after failover, we need to regain locality. Compact Region to regain locality We can do better by using HDFS features 24

Read Throughput: Favored Nodes (0.96) Service recovered; performance sustained because region assigned to favored node. recovery performance recovered Control and track where block replicas are All files for a region created such that blocks go to the same set of favored nodes When failing over, assign the region to one of those favored nodes. Currently a preview feature in 0.96 Disabled by default because it doesn t work well with the latest balancer or splits. Will likely use upcoming HDFS block affinity for better operability Originally on Facebook s 0.89, ported to 0.96 25

Hdfs replicas RS Hdfs replicas RS Read latency: HDFS speculative read HBase s HDFS client reads 1 of 3 replicas If you chose the slow node, your reads are slow. 1 2 3 Slow read! Too slow, read other replica Idea: If a read is taking too long, speculatively go to another that may be faster. 1 2 3 26

Hdfs replicas Hdfs replicas RS Hdfs replicas RS 27 Write latency: Multiple WALs HBase s HDFS client writes 3 replicas Min write latency is bounded by the slowest of the 3 replicas Idea: If a write is taking too long let s duplicate it on another set that may be faster. 1 2 3 1 2 3 1 2 3 Slow Write Too slow, write to other replica

HBase Extensions An Ecosystem of projects built on HBase 28

Making HBase easier to use and tune. With great power comes great responsibility. Difficult to see what is happening in HBase Easy to make poor design decisions early without realizing New Developments HTrace + Zipkin Frameworks for Schema design 29

HDFS HBase RS HTrace: Distributed Tracing in HBase and HDFS Framework Inspired by Google Dapper Tracks time spent in calls in RPCs across different machines. Threaded through HBase (0.96) and future HDFS. DN ZK HDFS NN HBase meta HBase Client 1 RPC calls A span 30

Zipkin Visualizing Spans UI + Visualization System Written by Twitter Zipkin HBase Storage Zipkin HTrace integration View where time from a specific call is spent in HBase, HDFS, and ZK. 31

HBase Schemas HBase Application developers must iterate to find a suitable HBase schema Schema critical for Performance at Scale How can we make this easier? How can we reduce the expertise required to do this? Old option: Learn the architecture. Use the API. Experiment and learn. 32

How should I arrange my data? Isomorphic data representations! Short Fat Table using column qualifiers Rowkey d:col1 d:col2 d:col3 d:col4 bob aaaa bbbb cccc dddd jon eeee ffff gggg hhhhh Short Fat Table using column families Rowkey col1: col2: col3: col4: bob aaaa bbbb cccc dddd jon eeee ffff gggg hhhhh Tall skinny with compound rowkey rowkey d: bob-col1 bob-col2 bob-col3 bob-col4 jon-col1 jon-col2 jon-col3 jon-col4 aaaa bbbb cccc dddd eeee ffff gggg hhhh 33

Row key design techniques Numeric Keys and lexicographic sort Store numbers big-endian. Pad ASCII numbers with 0 s. Row100 Row3 Row 31 vs. Row003 Row031 Row100 Use reversal to have most significant traits first. Reverse URL. Reverse timestamp to get most recent first. (MAX_LONG - ts) so time gets monotonically smaller. blog.cloudera.com hbase.apache.org strataconf.com Use composite keys to make key distribute nicely and work well with sub-scans Ex: User-ReverseTimeStamp Do not use current timestamp as first part of row key! vs. com.cloudera.blog com.strataconf org.apache.hbase 34

Row key design techniques Numeric Keys and lexicographic sort Store numbers big-endian. Pad ASCII numbers with 0 s. Row100 Row3 Row 31 vs. Row003 Row031 Row100 Use reversal to have most significant traits first. Reverse URL. Reverse timestamp to get most recent first. (MAX_LONG - ts) so time gets monotonically smaller. blog.cloudera.com hbase.apache.org strataconf.com Use composite keys to make key distribute nicely and work well with sub-scans Ex: User-ReverseTimeStamp Do not use current timestamp as first part of row key! vs. com.cloudera.blog com.strataconf org.apache.hbase 35

Phoenix A SQL skin over HBase targeting low-latency queries. JDBC SQL interface Highlights Adds Types Handles Compound Row key encoding Secondary indices in development Provides some pushdown aggregations (coprocessor). Open sourced by Salesforce.com Work from James Taylor, Jesse Yates, et al https://github.com/forcedotcom/phoenix 36

Impala Scalable Low-latency SQL querying for HDFS (and HBase!) ODBC/JDBC driver interface Highlights Use s Hive metastore and its hbase-hbase connector configuration conventions. Native code implementation, uses JIT for query execution optimization. Authorization via Kerberos support Open sourced by Cloudera https://github.com/cloudera/impala 37

Kiji APIs for building big data applications on HBase based on Google s Bigtable usage experience Highlights Provides types via Avro serialization and Schema encoding Provides locality group that logically maps to HBase s physical column families. Manages schema evolution Provides framework for applying machine learning to data Open sourced by WibiData http://www.kiji.org/ 38

Cloudera Development Kit (CDK) APIs that provides a Dataset abstraction Provides get/put/delete API in avro objects HBase Support in progress Highlights Supports multiple components of the hadoop distros (flume, morphlines, hive, crunch, hcat) Provides types using Avro and parquet formats for encoding entities Manages schema evolution Open source by Cloudera http://cloudera.github.io/cdk/docs/current 39

Conclusions 40

Summary by Version Major Features MTTR Perf Usability 0.90 (CDH3) 0.92 /0.94 (CDH4) 0.96 (CDH5) Next Stability True Durability Replication Import/Export/Copy Recovery in Hours Distributed log splitting* Baseline Metrics HBase Developer Expertise Reliability True Consistency Master-Master Replication Coprocs + Security Recovery in Minutes Distributed log splitting Better Throughput CF+Region Metrics HFile Checksums Short Circuit HDFS Read Blooms and Big Hfiles HBase Operational Experience Continuity Protobufs Snapshots Namespace Security Recovery of writes in seconds, reads in 10 s of Seconds Distributed log replay Fast Failure Notification Optimizing Performance HTrace Prefix Tree Encoding Exploring compaction Stochastic load balancer Distributed Systems Admin Experience Multitenancy Cell-level Tags Namespaces Isolation Namespace Quotas Recovery in Seconds (reads+writes) Pristine Region read recover Shadow Regions Predictable Performance Multi WAL Speculative Reads Favored nodes Application Developers Experience experimental in progress *backported 41

Questions? @jmhsieh 42 More questions? Come to the Cloudera booth for an ask the expert session with Jon and Lars George!