Hypertable Goes Realtime at Baidu. Yang Dong yangdong01@baidu.com Sherlock Yang(http://weibo.com/u/2624357843)



Similar documents
Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Bigtable is a proven design Underpins 100+ Google services:

Hypertable Architecture Overview

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Using distributed technologies to analyze Big Data

Hadoop: Embracing future hardware

How To Scale Out Of A Nosql Database

In Memory Accelerator for MongoDB

Apache Hadoop: Past, Present, and Future

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Constructing a Data Lake: Hadoop and Oracle Database United!

Can the Elephants Handle the NoSQL Onslaught?

Comparing SQL and NOSQL databases

NoSQL Data Base Basics

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012

Apache HBase. Crazy dances on the elephant back

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Design and Evolution of the Apache Hadoop File System(HDFS)

Big Data With Hadoop

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Big Fast Data Hadoop acceleration with Flash. June 2013

Hadoop for MySQL DBAs. Copyright 2011 Cloudera. All rights reserved. Not to be reproduced without prior written consent.

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Hadoop Ecosystem B Y R A H I M A.

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, XLDB Conference at Stanford University, Sept 2012

A Performance Analysis of Distributed Indexing using Terrier

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Hadoop IST 734 SS CHUNG

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

CitusDB Architecture for Real-Time Big Data

How To Use Big Data For Telco (For A Telco)

HBase Schema Design. NoSQL Ma4ers, Cologne, April Lars George Director EMEA Services

Benchmarking Cassandra on Violin

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

Oracle Big Data SQL Technical Update

Big data blue print for cloud architecture

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Real-time Analytics at Facebook: Data Freeway and Puma. Zheng Shao 12/2/2011

Hadoop & Spark Using Amazon EMR

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Integrating Big Data into the Computing Curricula

Putting Apache Kafka to Use!

SharePoint 2010 Performance and Capacity Planning Best Practices

Hadoop & its Usage at Facebook

How To Handle Big Data With A Data Scientist

Kafka & Redis for Big Data Solutions

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

BIG DATA TRENDS AND TECHNOLOGIES

Apache Hadoop. Alexandru Costan

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni

Benchmarking Hadoop & HBase on Violin

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Apache Hadoop FileSystem and its Usage in Facebook

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

Apache HBase: the Hadoop Database

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

Scaling Up 2 CSE 6242 / CX Duen Horng (Polo) Chau Georgia Tech. HBase, Hive

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Monitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center

Big Data with Component Based Software

Big Data and Industrial Internet

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

America s Most Wanted a metric to detect persistently faulty machines in Hadoop

Introduction to Hbase Gkavresis Giorgos 1470

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

BIG DATA What it is and how to use?

Large scale processing using Hadoop. Ján Vaňo

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

NoSQL and Hadoop Technologies On Oracle Cloud

Xiaoming Gao Hui Li Thilina Gunarathne

Media Upload and Sharing Website using HBASE

NoSQL: Going Beyond Structured Data and RDBMS

Configuring Apache Derby for Performance and Durability Olav Sandstå

A Brief Outline on Bigdata Hadoop

How To Scale Myroster With Flash Memory From Hgst On A Flash Flash Flash Memory On A Slave Server

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Trafodion Operational SQL-on-Hadoop


An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

Operations and Big Data: Hadoop, Hive and Scribe. Zheng 铮 9 12/7/2011 Velocity China 2011

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Performance and Scalability Overview

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Big Data Primer. 1 Why Big Data? Alex Sverdlov alex@theparticle.com

Oracle Aware Flash: Maximizing Performance and Availability for your Database

Performance and Scalability Overview

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

EOFS Workshop Paris Sept, Lustre at exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

Java DB Performance. Olav Sandstå Sun Microsystems, Trondheim, Norway Submission ID: 860

Zynga Analytics Leveraging Big Data to Make Games More Fun and Social

Real World Big Data Architecture - Splunk, Hadoop, RDBMS

Transcription:

Hypertable Goes Realtime at Baidu Yang Dong yangdong01@baidu.com Sherlock Yang(http://weibo.com/u/2624357843)

Agenda Motivation Related Work Model Design Evaluation Conclusion 2

Agenda Motivation Related Work Model Design Evaluation Conclusion 3

Motivation FAILURE Interval : 60 days Interval: 1 second 4

What is Noah Noah System Operational Data Store System metrics (CPU, Memory, IO, Network) Application metrics (Web, DB, Caches) Baidu metrics (Usage, Revenue) Easily visualize data over time Supports complex aggregation, transformations, etc. Component Monitoring sub-system Collection sub-system...... 5

System Requirement Storage Capacity 10TB~100TB 100~1000 billion Records Automatic Sharding Irregular data growth patterns Heavy Writes 10000~30000 inserts/s Fast Reads of Recent Data Recently written data should be available quickly Table Scans e.g. The entire dataset will also be periodically scanned in order to perform time-based rollups 6

Problems of Existing Stack MySQL Not inherently distributed Difficult to scale Irregular data growth patterns Manually re-sharding data frequently Table size limitation Inflexible schema Hadoop No support for random writes Poor support for random reads 7

Hypertable Meets the Requirements Hypertable + Hadoop Elasticity High write throughput High Availability and Disaster Recovery Fault Isolation Range Scans 8

Typical Applications of Hypertable Common Library AsyncComm/Compression/HQL in Hypertable Database (Query Engine) Hypertable MapReduce (Batch Processing) MapReduce + Hypertable OLAP (Online Analytical Processing) MySQL + Hypertable 9

Agenda Motivation Related Work Model Design Evaluation Conclusion 10

Related Work Apache Hadoop Goes Realtime at Facebook Titan (Facebook Messages) Puma (Facebook Insights) ODS (Facebook Internal Metrics) 11

Agenda Motivation Related Work Model Design Evaluation Conclusion 12

Model Design Storage System Architecture Functional Model Processing Model Data Model 13

Storage System Architecture Java/C++ Client, Thrift Client, MR-like Client, SQL-like Client Scribe Flume Data Inputs Distributed Data Structures Data Warehouse Computing System Hybrid Storage System KV File Block Storage Memory / SSD / Disk Object Analytics System Data Outputs Middleware/DB/K-V Stores Web Fronted OLTP System 14

Functional Model Transfer Transfer Transfer Client Client Client Client Random(Read Scan %%%%%%%%%%%%Client%IO%Stub Real%&me(Flow History(Flow %%%Stream%Compu9ng%Engine %%%%%%%%%%%%%%%%%%%Query%Engine %%%%%%%%%%%%%%%%%%%%Hypertable HDFS Noah%Storage%Cluster 15

Processing Model Query Interface InfoQuery <table> <rowkey_range_start> <rowkey_range_end> <column_family> <to_file> <human machine> [max_lines] table: table name rowkey_range_start & end: row key string range column_family: monitor_data_status is value, other tables are value_pack to_file: file by result written into human machine: text or binary max_lines: default < 2w 16

Processing Model Query Mode Normal Query Hostname + MonitorItemIDs + Interval -> Results Graph Hostnames + MonitorItemIDs -> Sorted Latest Results Hostnames -> Status Results Special Query Many Hosts + MonitorItemIDs + Interval: Multiple queries Long Interval: default <= 2w Recods Low Latency: No guarantee Batch queries continuously: Influence to system 17

Data Model Tradeoff Query efficiency Record volume Tables Row Key Column Version TTL History Tables Hostname + Timestamp Latest Table Status Table Hostname + MonitorItemID MonitorItem +MonitorItem* MonitorItem +MonitorItem* 1 0.5 ~ 24 months Hostname MonitorItem 10 24 months 1 18

Agenda Motivation Related Work Model Design Evaluation Conclusion 19

Evaluation High Availability Memory Usage Read/Write Performance 20

High Availability Challenge Recovery No support sync for append operation in Hadoopv2 (based on 0.18.2) Load Balance No implementation in Hypertable-B2 (based on 0.9.2.0) 21

High Availability Improvements Recovery Data in HDFS, Log in LocalFS Local-Recovery Centralized Meta Ranges 2048PB User-Data index by 16TB Meta-Data Master Standby Application Remove-duplicates Load Balance Split-driven Manual online and offline operations 22

High Availability Deployment Model (before improvement) Hyperspace Master Master( (Standby) %%%%Range%Server %%%%%%Range%Server %%%%%Range%Server Meta%Table Distributed% File%System % %%%%%%%%%%%%%%%(e.g.% HDFS,%KFS% ) User%Table Commit%Log% 23

High Availability Deployment Model (after improvement) Hyperspace Master Master( (Standby) Meta%Range%Server User%Range%Server User%Range%Server Distributed% Meta%Table File%System % %%%%%%%%%%%%%%%(e.g.% HDFS,%KFS% ) User%Table Commit%Log% Local%File%System 24

Weaknesses Range data managed by a single range server Though no data loss, can cause periods of unavailability Can be mitigated by client-side cache or memcached Can minimize recovery time by active standby of cluster 25

Memory Usage Challenge Hypertable is memory intensive Function (during execution) Hypertable::CellCache::add gnu_cxx::new_allocator::allocate Hypertable::DynamicBuffer::grow Hypertable::IOHandlerData::handle_event Hypertable::BlockCompressionCodecLzo ::BlockCompressionCodecLzo Memory Usage 75.6% 18.8% 4.1% 1.0% 0.5% 26

Memory Usage Improvement TCMalloc SimplePollMalloc <Key, value> MemPool <Key, value> Map Cell Cache (Freeze) Cell Cache (new) CellMap Memory Distributed FileSystem Compaction Memory Allocation Cell Store Cell Store Cell Store Cell Store 27

Memory Usage Function (during execution) Mem Usage Mem Usage Function (during execution) CellCachePool::get_memory 94.3% 75.6% Hypertable ::CellCache::add 18.8% gnu_cxx ::new_allocator::allocate Hypertable:: DynamicBuffer::grow 3.8% 4.1% Hypertable ::DynamicBuffer::grow Hypertable ::BlockCompressionCodecLzo ::BlockCompressionCodecLzo 1.1% 1.0% Hypertable ::IOHandlerData ::handle_event Hypertable ::IOHandlerData ::handle_event 0.5% 0.5% Hypertable ::BlockCompressionCodecLzo ::BlockCompressionCodecLzo 28

Memory Usage New/Delete vs. TCMalloc vs. PollMalloc vs. PollMalloc (with map) RangeServer Memory Usage Original Malloc Test Memory Usage(KB) Pool Malloc Test(map) Pool Malloc Test(nonmap) TC Malloc Test Time(s) 29

Memory Usage Opportunity Compaction Efficiency Memory Usage(KB) RangeServer Memory-Usage RS-Mem-Usage PoolCache(s)-Mem-Usage Time(s) Maintanence Queue Elememt Number 12 10 Maintenance Task Thread(1) QE Number 8 6 4 2 0 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 155 162 169 Time [non-seq] 30

Memory Usage Improvement Compaction Optimization Factor Cell Cache Buffer Size Maintenance Thread Number RangeServer Memory-Usage Comparation(Map) Memory Usage(KB) Original Malloc Test Pool Malloc MTask1 Pool-4MB Pool Malloc MTask2 Pool Malloc MTask1 Pool-4KB Pool Malloc MTask1 Pool-1MB Pool Malloc MTask1 Pool-512KB Pool Malloc MTask1 Pool-128KB TC Malloc Test Time(s) 31

Read Performance Challenge Noah generates lots of random read ops Machine Sequential Write Random Read Random Write (KB/s) (KB/s) (KB/s) X1 284191 1932 79834 X2 165264 973 1627 X3 93919 863 7256 X4 63565 471 23149 X5 55598 424 18768 X6 60858 336 23188 <./iozone -f t3 -s 10g -c -e -w -+n -i 0./iozone -f t3 -s 10g -c -e -w -+n -i 2 > 32

Read Performance Memory Read Latency ms level Memory usage Sparse Table: value + 40Bytes(keys + map) MVCC: Old record is deleted when cell cache compaction 33

Read Performance Cache Read Block Cache Cellstore blocks Uncompressed Query Cache Query results 34

Read Performance Disk Read CellStore 5 CSs/Range, 5*216=950 RR, 950*2=1900 RR; 500 iops, ioutil 100% Bloomfilter CS 10 CS 5 CS 1 Threads 1 16 32 1 16 32 1 16 32 Queries/s Aggregate Queries/s Average Latency(ms) 3.8 3.8 2.4 9.8 13.5 5 23.1 52.6 25.6 3.8 61 77 9.8 216 160 23.1 1684 1641 260 260 420 102 74 200 43 19 39 35

Read Performance Disk Read Compression Lzo/Quicklz/Gzip/Snappy SSD/SATA/SAS Read Access Time/Capacity/Price per GB/Idle or Full power/mtbf Concurrent W/R Resource isolation Manual control 36

Read Performance Hypertable vs. Hbase 37

Write Performance Challenge HDFS operation pressure RangeServer writing pressure Improvement Compaction frequency adjusting Row key reversal 38

Agenda Motivation Related Work Model Design Evaluation Conclusion 39

Conclusion Application level Table Design Load-in Strategy Duplicate Removing High Availability Metadata Centralization Master Standby Log/Data Separation Load Balance System Active Standby Memory Usage Memory Pool Compaction Strategy Read/Write Performance Mem/SSD/SAS/SATA Block/Query Caching Compression Strategy Resource Isolation Compaction Strategy 40

Hypertable versus HBase versus Hypertable HBase Community Hypertable Apache Company Hypertable Inc Cloudera, Implementation Language Boost C++ Memory Management Explicit Memory Management HortonWorks Java Cache Managemnt Dynamic cache management Performance High Normal Compiler Optimization Easy Hard Garbage Collection Java heap for caching Compression Direct native compression JNI-based compression 41

Thanks for your Attention! 42

Questions? 43

关 注 我 们 :t.baidu-tech.com 资 料 下 载 和 详 细 介 绍 :infoq.com/cn/zones/baidu-salon 畅 想 交 流 争 鸣 聚 会 是 百 度 技 术 沙 龙 的 宗 旨 百 度 技 术 沙 龙 是 由 百 度 与 InfoQ 中 文 站 定 期 组 织 的 线 下 技 术 交 流 活 动 目 的 是 让 中 高 端 技 术 人 员 有 一 个 相 对 自 由 的 思 想 交 流 和 交 友 沟 通 的 的 平 台 主 要 分 讲 师 分 享 和 OpenSpace 两 个 关 键 环 节, 每 期 只 关 注 一 个 焦 点 话 题 讲 师 分 享 和 现 场 Q&A 让 大 家 了 解 百 度 和 其 他 知 名 网 站 技 术 支 持 的 先 进 实 践 经 验,OpenSpace 环 节 是 百 度 技 术 沙 龙 主 题 的 升 华 和 展 开, 提 供 一 个 自 由 交 流 的 平 台 针 对 当 期 主 题, 参 与 者 人 人 都 可 以 发 起 话 题, 展 开 讨 论 InfoQ 策 划 组 织 实 施 关 注 我 们 :weibo.com/infoqchina