Neptune Distributed Data Storage. H.J. Kim 2009.07 http://dev.naver.com/projects/neptune http://www.openneptune.com



Similar documents
Xiaoming Gao Hui Li Thilina Gunarathne

Apache HBase. Crazy dances on the elephant back

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Introduction to Hbase Gkavresis Giorgos 1470

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Comparing SQL and NOSQL databases

Big Table A Distributed Storage System For Data

How To Scale Out Of A Nosql Database

Data storing and data access

Hadoop IST 734 SS CHUNG

Open source large scale distributed data management with Google s MapReduce and Bigtable

Apache HBase: the Hadoop Database

Processing of massive data: MapReduce. 2. Hadoop. New Trends In Distributed Systems MSc Software and Systems

HareDB HBase Client Web Version USER MANUAL HAREDB TEAM

Storage of Structured Data: BigTable and HBase. New Trends In Distributed Systems MSc Software and Systems

A programming model in Cloud: MapReduce

Bigtable is a proven design Underpins 100+ Google services:

Cloud Computing at Google. Architecture

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Qsoft Inc

MapReduce, Hadoop and Amazon AWS

Scaling Up 2 CSE 6242 / CX Duen Horng (Polo) Chau Georgia Tech. HBase, Hive

Hypertable Architecture Overview

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee June 3 rd, 2008

Media Upload and Sharing Website using HBASE

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Hadoop Ecosystem B Y R A H I M A.

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

HBase Schema Design. NoSQL Ma4ers, Cologne, April Lars George Director EMEA Services

Hadoop implementation of MapReduce computational model. Ján Vaňo

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October :00 Sesión B - DB2 LUW

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Apache Hadoop FileSystem and its Usage in Facebook

Complete Java Classes Hadoop Syllabus Contact No:

COURSE CONTENT Big Data and Hadoop Training

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

Big Data With Hadoop

Big Data and Scripting Systems build on top of Hadoop

Internals of Hadoop Application Framework and Distributed File System

Workshop on Hadoop with Big Data

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Using distributed technologies to analyze Big Data

Certified Big Data and Apache Hadoop Developer VS-1221

Chapter 7. Using Hadoop Cluster and MapReduce

MySQL and Hadoop. Percona Live 2014 Chris Schneider

Big Data : Experiments with Apache Hadoop and JBoss Community projects

Cloudera Certified Developer for Apache Hadoop

Large Scale Text Analysis Using the Map/Reduce

Hadoop and its Usage at Facebook. Dhruba Borthakur June 22 rd, 2009

MapReduce with Apache Hadoop Analysing Big Data

Four Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014

Operations and Big Data: Hadoop, Hive and Scribe. Zheng 铮 9 12/7/2011 Velocity China 2011

Peers Techno log ies Pv t. L td. HADOOP

Introduction to HDFS. Prasanth Kothuri, CERN

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Apache Hadoop. Alexandru Costan

Large scale processing using Hadoop. Ján Vaňo

Hadoop & its Usage at Facebook

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Hadoop & its Usage at Facebook

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, XLDB Conference at Stanford University, Sept 2012

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

!"#$%&' ( )%#*'+,'-#.//"0( !"#$"%&'()*$+()',!-+.'/', 4(5,67,!-+!"89,:*$;'0+$.<.,&0$'09,&)"/=+,!()<>'0, 3, Processing LARGE data sets

Hbase - non SQL Database, Performances Evaluation

Trafodion Operational SQL-on-Hadoop

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Criteria to Compare Cloud Computing with Current Database Technology

A very short Intro to Hadoop

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Cassandra A Decentralized, Structured Storage System

How to Hadoop Without the Worry: Protecting Big Data at Scale

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Apache Hadoop: Past, Present, and Future

Benchmarking Hadoop & HBase on Violin

A Scalable Data Transformation Framework using the Hadoop Ecosystem

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

Design and Evolution of the Apache Hadoop File System(HDFS)

Hadoop Architecture. Part 1

Hive Development. (~15 minutes) Yongqiang He Software Engineer. Facebook Data Infrastructure Team

Data Pipeline with Kafka

Hadoop Distributed File System. Dhruba Borthakur June, 2007

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Integration of Apache Hive and HBase

Big Data Primer. 1 Why Big Data? Alex Sverdlov alex@theparticle.com

Transcription:

Neptune Distributed Data Storage H.J. Kim 2009.07 http://dev.naver.com/projects/neptune http://www.openneptune.com

Data Tsunami 1 billions book 40 billions Web page 55 trillions Web link 281 exa-bytes 45 GB/person 2X growth in 4 years

Think Various Storage! Data Complexity Dynamo Cassandra NAS Data Volume

Neptune Distributed Data Storage semi-structured data store(not file system) Use Distributed File System for Data file Supports real time and batch processing Google Bigtable clone Data Model, Architecture, Features Goal 1,000 nodes 100 ~ 200 GB per node, Peta bytes

Features Schema Management Create, drop, modify table schema Real-time Transaction Single row operation(no join, group by, order by) Multi row operation: like, between Batch Transaction Scanner, uploader, map&reduce adapter Scalability Automatic table split & re-assignment Reliability Data file stored in Distributed File System(HDFS, Others) Commit log stored in ChangeLog Cluster Failover Tablet takeover time: max 2 min. Utility Web Console, Shell(simple query), Data Verifier

Architecture 사용자 애플리케이션 분산/병렬컴퓨팅 플랫폼(MapReduce) Neptune Master Neptune (대용량분산 데이터 저장소) TabletServer #1 TabletServer #2 TabletServer #n 논리적 Table 물리적 저장소 분산파일시스템(Hadoop or other)

System Components Master Lock Server Neptune Client Neptune Master Neptune Master failover / event ZooKeeper NChubby Pleidas NTable Scanner Shell Control failover / event Data/Control TabletServer #1 (Neptune) LogServer #1 TabletServer #2 (Neptune) LogServer #2 TabletServer #n (Neptune) LogServer #n DFS #1 (DataNode) Computing #1 (Map&Reduce) DFS #2 (DataNode) Computing #2 (Map&Reduce) DFS #n (DataNode) Computing #n (Map&Reduce) Local disk Local disk Local disk

Data Model Table TabletA-1 row #1 rk-1 ck-1 v1, t1 v2, t2 row #k Rowkey ck-2 Column#1 v3, t2 v4, t3 v5, t4 Column#n TabletA-2 row #k+1 ck-n vn, tn - Sorted by rowkey - Sorted by columnkey TabletA-n row #m row #m+1 row #n Row#1 Column1 Cell1 Cell2 Cell3 Row.Key Column2 Cell1 Cell2 Cell-k Column-n Cell1 Cell2 Cell-m Cell Cell.Key Cell.Value(t1) Cell.Value(t2) Cell-n Cell.Value(tn)

Data Model Examples 1:N relation - 1 user has 1+friends - will lookup all friends of a user T_USER id(pk) RDBMS T_FRIEND user_id Neptune T_USER_FRIEND row info friend name sex friend_id type <user_id> name sex age <user_id>=type age select * from T_USER, T_FRIEND where T_USER_ID.id =? and T_USER_ID.id = T_FRIEND.user_id Hbase Schema Design Case Studies (http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies)

Data Model Examples access log - each log line contains time, ip, domain, url - will be analyzed every 5 minutes, every hour, daily, weekly RDBMS T_ACCESS_LOG time T_ACCESS_LOG Neptune row http user ip domain url <time><inc_counter> ip domain url referer login_id referer login_id

Data Model Examples N:M 관계 - 1 student many courses - 1 course many students RDBMS T_Student Neptune T_Student T_S_C T_Course row info course id(pk) name s_id c_id id(pk) title s_id name sex age c_id:<type> sex age type teacher_id T_Course row info course c_id title teacher_id s_id:<type>

Data Operation Client put(key, value) ChangeLogServer TabletServer MemoryTable Minor Compaction ChangeLog get(key) Searcher Merged MapFile (HDFS) MapFile#1 (HDFS) MapFile#2 (HDFS) MapFile #n (HDFS) Major Compaction

Failover 6. Active elected NeptuneMaster #1 NeptuneMaster #2 NeptuneMaster #3 1. Try lock 1. Try lock 1. Try lock 2. Get lock 5. Get lock /neptune_master ZooKeeper Cluster (5 nodes) Where master? NeptuneClient 3. Active elected Send event 4. Master fail No shared data in Master Manage Tablet assignment /tserver_host01 Get lock TabletServer Network fail If can t lock -> self kill

Failover Master 장애 Table Schema Management, Tablet Split 기능만 장애 Active-standby로 장애 대처 TabletServer 장애 Master에 의해 Tablet re-assign 수초 ~ 수십 초 이내 복구 ZooKeeper 장애 5개 node로 클러스터 구성 절대 장애 발생하지 않음 Hadoop NameNode 장애 별도의 이중화 방안 필요 Hadoop 전체 장애 Neptune 클러스터도 장애

TabletInputFormat MapReduce TableA TabletA-1 Tablet A-2 Tablet A-3 Tablet A-N TaskTracker Map TaskMap Task Task TaskTracker Map TaskMap Task Task TaskTracker Map TaskMap Task Task Partition using key TaskTracker Reduce Task TaskTracker Reduce Task TableB Tablet B-1 Tablet B-2 DBMS or HDFS META Table

Client Client API Single row operation: put/get Multi row operation: like, between Batch operation: scanner/uploader MapReduce: TabletInputFormat Command line Shell NQL(Neptune Query Language) JDBC support Web Console

Client API Example TableShema tableschema = new TableSchema( T_TEST, new String[]{ col1, col2 }); NTable.createTable(tableSchema); NTable ntable = Ntable.openTable( T_TEST ); Row row = new Row(new Row.Key( RK1 )); Row.addCell( col1, new Cell(new Cell.Key( CK1 ), test_value.getbytes())); ntable.put(row); Row selectedrow = ntable.get(new Row.Key( RK1 )); System.out.println(selectedRow.getCellList( col1 ).get(0)); TableScanner scanner = ScannerFactory.openScanner(ntable, new String[]{ col1 }); Row scanrow = null; while( (scanrow = scanner.next()) = null) { System.out.println(selectedRow.getCellList( col1 ).get(0)); } scanner.close();

Data Definition CREATE TABLE DROP TABLE SHOW TABLES DESC Data Manipulation SELECT DELETE INSERT TRUNCATE COLUMN TRUNCATE TABLE SET CHARSET Cluster Monitoring PING TABLETSERVER REPORT TABLE SHOW USERS STOP ACTION Neptune Shell

Web Console

Performance Experiment Neptune HBase HBase(Cache) Random read 495 578 1,623 Random write 1,223 2,864 8,300 Sequential read 498 600 2,109 Sequential write 1,327 2,635 6,553 Scan 40,329 22,795 30,840 Number of 1000-byte values read/written per second

HBase, Bigtable File System Neptune Bigtable HBase Hadoop DFS or other DFS GFS Hadoop DFS Computing Hadoop or others MapReduce Hadoop Master failover Yes(ZooKeeper) Yes(Chubby) 0.20(ZooKeeper) Script Language No(NQL) Sawzall No Change log 별도 구성 GFS HDFS + Memory API Java, Thrift, REST C++ Java, Thrift, REST ACL Yes Yes No Memory Table No Yes No Scanner Yes Yes Yes Uploader Yes Unknown No

Storage 구분 데이터 용량 (확장성) 실시간 데이터 처리 데이터 복잡성 안정성 분석작업 연계 비용 Local Disk X X X X X Low NAS O X X O X Middle RDBMS X O O Distributed RDBMS O O O (Difficult) (Difficult) X High Very High Hadoop O X X O O Low Bigtable 계열 (Neptune 등) Dynamo 계열 (Dynomite 등) O O O O Low O O X O Low

Google Infra Usage Application Type: 대용량 데이터 실시간 조회 Application Type: 대용량 데이터 분석 + 실시간 조회 Application Type: 대용량 데이터 저장 + 분석 Application Type: 대용량 데이터 저장

Neptune Usage RDBMS (Slave) WebServer 분석결과 조회 실시간처리 (META 데이터) 실시간처리 (대량데이터) 실시간조회 (입력데이터/ 분석데이터) RDBMS (Master) Neptune Neptune Neptune 분석용 Out 분석용 In/Out Batch Processing 첨부파일 Neptune Neptune Hadoop 분석용 In/Out

Stress Test Cluster: 43 nodes 1 hadoop NameNode, 42 DataNode 1 Job Tracker, 20 TaskTracker 7 TabletServer node: 2GB Heap 15 ChangeLogServer node Disk: Hadoop and ChangeLogServer use different disk Hadoop: 0.19.0 Map Task: 1024 map task, 1GB/Map 2 Maps/TaskTracker, 40 Map Task Concurrently Map only Data: 1 row: 10,000 bytes Total 1TB, 110 million rows

- 20 40 60 80 100 120 140 160-100 200 300 400 500 600 700 800 900 2 20 38 56 74 92 110 128 146 164 182 200 218 236 254 272 290 308 326 344 362 380 398 416 434 452 470 488 506 524 542 560 578 596 614 632 650 668 TPS/TabletServer TPS/TabletServer Data(GB)/TabletServer Time(min) TPS GB

0 500 1000 1500 2000 2500 3000 1 25 49 73 97 121 145 169 193 217 241 265 289 313 337 361 385 409 433 457 481 505 529 553 577 601 625 649 673 697 721 745 769 793 817 841 865 889 913 937 961 985 Map Task Elapse Time(sec) Map ID Time(sec)

Test Result Elapse time: 11 hour 40 min Average TPS/TabletServer: 394 Average TPS/Cluster: 2,758 Average put latency: 9 ms Total # Tablets: 8,133 Average Tablet size: 130MB Each TabletServer # Tablets: 1,162 Service data:143gb Heap Usage: Free: 935,746 KB Total: 2,080,128 KB

Powered by Neptune GAIA(http://www.gaiaville.com) Cloud Searchable Storage Service

Milestone Neptune-1.4 release(2009.07) Split시 lock time 최소화 Supports ganglia metrics Tablet Balancer Add start key in META record Neptune-1.5(2009.10) get 성능향상: DFS Block Cache, Bloom Filter Hive query 연동 Tablet 할당 정책

Join Neptune project http://www.openneptune.com

Question http://dev.naver.com/projects/neptune http://www.openneptune.com babokim@gmail.com