Hadoop Open Platform-as-a-Service (Hops)

Similar documents
Hadoop Open Platform-as-a-Service (Hops)

Jim Dowling KTH Royal Institute of Technology, Stockholm SICS Swedish ICT CSHL Meeting on Biological Data Science, 2014

Managing large clusters resources

MANAGING RESOURCES IN A BIG DATA CLUSTER.

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Big Data Technology Core Hadoop: HDFS-YARN Internals

Hadoop Ecosystem B Y R A H I M A.

D2.3 Scalable and Highly Available HDFS

The Hadoop Distributed File System

Big Data With Hadoop

Introduction to Hadoop

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

NoSQL Data Base Basics

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

HOPS: Hadoop Open Platform-as-a-Service

Apache Hadoop FileSystem and its Usage in Facebook

How to Hadoop Without the Worry: Protecting Big Data at Scale

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Design and Evolution of the Apache Hadoop File System(HDFS)

Hadoop & its Usage at Facebook

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Apache HBase. Crazy dances on the elephant back

HDFS Under the Hood. Sanjay Radia. Grid Computing, Hadoop Yahoo Inc.

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

Entering the Zettabyte Age Jeffrey Krone

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Apache Hadoop. Alexandru Costan

Hadoop Scalability at Facebook. Dmytro Molkov YaC, Moscow, September 19, 2011

Large scale processing using Hadoop. Ján Vaňo

Storage Architectures for Big Data in the Cloud

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

HADOOP MOCK TEST HADOOP MOCK TEST II

Hadoop & Spark Using Amazon EMR

<Insert Picture Here> Big Data

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Hadoop implementation of MapReduce computational model. Ján Vaňo

HDFS Architecture Guide

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Chapter 7. Using Hadoop Cluster and MapReduce

There's Plenty of Room in the Cloud

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Big Data Management and Security

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Challenges for Data Driven Systems

Communicating with the Elephant in the Data Center

Distributed File Systems

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Apache Hadoop FileSystem Internals

Hadoop & its Usage at Facebook

Deploying Hadoop with Manager

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop: Embracing future hardware

Splice Machine: SQL-on-Hadoop Evaluation Guide

Lecture Data Warehouse Systems

Distributed File Systems

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

IJFEAT INTERNATIONAL JOURNAL FOR ENGINEERING APPLICATIONS AND TECHNOLOGY

Hadoop Distributed File System. Dhruba Borthakur June, 2007

Upcoming Announcements

Processing NGS Data with Hadoop-BAM and SeqPig

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October :00 Sesión B - DB2 LUW

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

GraySort and MinuteSort at Yahoo on Hadoop 0.23

Integrating Big Data into the Computing Curricula

Data Management in the Cloud

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

Sujee Maniyam, ElephantScale

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Apache Sentry. Prasad Mujumdar

MASSIVE DATA PROCESSING (THE GOOGLE WAY ) 27/04/2015. Fundamentals of Distributed Systems. Inside Google circa 2015

How To Use Big Data For Telco (For A Telco)

NoSQL and Hadoop Technologies On Oracle Cloud

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

MapReduce with Apache Hadoop Analysing Big Data

Pepper: An Elastic Web Server Farm for Cloud based on Hadoop. Subramaniam Krishnan, Jean Christophe Counio Yahoo! Inc. MAPRED 1 st December 2010

Big Data and Data Science: Behind the Buzz Words

A Survey of Distributed Database Management Systems

HDFS Users Guide. Table of contents

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

CSE-E5430 Scalable Cloud Computing. Lecture 4

In Memory Accelerator for MongoDB

Scalable Architecture on Amazon AWS Cloud

Open source Google-style large scale data analysis with Hadoop

Extending Hadoop beyond MapReduce

Hadoop Architecture and its Usage at Facebook

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Transcription:

Hadoop Open Platform-as-a-Service (Hops) Academics: PostDocs: PhDs: R/Engineers: Jim Dowling, Seif Haridi Gautier Berthou (SICS) Salman Niazi, Mahmoud Ismail, Kamal Hakimzadeh, Ali Gholami Stig Viaene (SICS), Steffen Grohschmeidt MSc Students: Theofilos Kakantousis, Nikolaos Stangios, Sri Srijeyanthan, Vangelos Savvidis, Seçkin Savaşçı.

What is Systems Research?* Systems research is the scientific study, analysis, modeling and engineering of effective software platforms. Its challenge is to provide dependable, powerful, performant, secure and scalable solutions within an increasingly complex IT environment. *Drushel et al, Fostering Systems Research in Europe, A White Paper by EuroSys, 2006

Why is Big Data Important? In a wide array of academic fields, the ability to effectively process data is superseding other more classical modes of research. More data trumps better algorithms * * The Unreasonable Effectiveness of Data [Halevey, Norvig et al 09]

Bill Gates biggest product regret* http://www.zdnet.com/article/bill-gates-biggest-microsoft-product-regret-winfs/

Windows Future Storage (WinFS*) WinFS was an attempt to bring the benefits of schema and relational databases to the Windows file system. The WinFS effort was started around 1999 as the successor to the planned storage layer of Cairo and died in 2006 after consuming many thousands of hours of efforts from really smart engineers. - [Brian Welcker]* *http://blogs.msdn.com/b/bwelcker/archive/2013/02/11/the-vision-thing.aspx

Background: Hadoop Filesystem and MapRed 6

HDFS: Hadoop Filesystem write /crawler/bot/jd.io/1 Name node Under-replicated blocks Heartbeats Rebalance Re-replicate blocks 1 2 35 1 3 1 3 4 1 2 3 2 4 5 4 5 6 2 6 5 6 Data nodes Data nodes

Big Data Processing with No Data Locality Job( /genomes/jim.bam ) submit Workflow Manager Compute Grid Node Job This doesn t scale. Bandwidth is the bottleneck 1 2 3 2 5 6 4 3 6 3 5 6 1 2 4 1 4 5

MapReduce Data Locality Job( /genomes/jim.bam ) submit Job Tracker Task Task Task Task Task Task Tracker Tracker Tracker Tracker Tracker Tracker Job Job Job Job Job Job DN DN DN DN DN DN 1 2 3 2 5 6 4 3 6 3 5 6 1 2 4 1 4 5 R R = resultfile(s) R R

MapReduce Programming Model join Scan filter Sort Batch Sequential Processing With Fault Tolerance

The NameNode 11

HDFS NameNode Stores Mappings: path_component -> inode inode -> {block} block -> {replica1,replica2,replica3} External API to HDFS Clients - Internal API to DataNodes Monitors Datanodes for failures, corrupted data Manages Leases, Quotas, (re-)replication Must do all this in a single JVM - Spotify have a 90GB Heap storing references to 300m files 12

High Availability for the NameNode HDFS 2.x Agreement on the Active Master ZK ZK ZK DOESN T SCALEOUT! JN JN JN Master-Slave Replication of NN State. NN Active Shared NN log stored in NN quorum of journal nodes NN Standby Faster Recovery, Cut Journal Log Checkpt NN DN DN DN DN

The Evolution of the NamNode HDFS (2006) - In-memory metadata HDFS 0.07 (2006) - WAL (EditLog) - FSImage HDFS 0.21 (2009) - Weaken Global Lock They reinvented the Database for the NameNode! HDFS 2.0 (2011) - Eventually Consistent Replication: HA-NameNode

Databases had these features long ago Oracle v6 (1988) - Redo and Undo Logs - Rollback Segments Oracle V7.1 (1994) - Symmetric Replication and have continued to evolve.. Oracle 9i RAC (2001) - Shared State Replication

The end of the One-size-fits-All Database Columnar Databases - Vertica, Hana NewSQL Databases - MySQL Cluster, VoltDB, Memstore, AtlasDB, FoundationDB Graph Databases - Neo4J RDBMSes - MySQL, Postgres, DB2, Oracle, SQLServer In-Memory Stores - Memcached, Redis Key-Value Stores - Dynamo, Cassandra, MongoDB, Riak Petabyte Databases - BigQuery (Google), RedShift (Amazon), Impala (Cloudera) Stonebraker et al, One Size Fits All: An Idea Whose Time Has Come and Gone, 2005 16

MySQL Cluster (NDB) Shared Nothing DB SQL API NDB API 30+ million update transactions/second on a 30-node cluster Distributed, In-memory 2-Phase Commit - Replicate DB, not the Log! Real-time - Low TransactionInactive timeouts Commodity Hardware Scales out - Millions of transactions/sec - TB-sized datasets (48 nodes) Split-Brain solved with Arbitrator Pattern SQL and Native Blocking/Non- Blocking APIs 17

18 HopsFS

HopsFS Customizable and Scalable Metadata High throughput for read and write operations NameNode failover time 5 seconds (vs ~1 minute for HDFS)

Request Handling (Apache HDFS vs HopsFS) Apache HDFS NameNode Request Handling HopsFS NameNode Request Handling

Fine-Grained Locking, Transactional Updates NDB gives us READ_COMMITTED isolation-level, not strong enough. We implemented Serializability for FS operations using implicit locking in the DAG and row-level locking in NDB. [Hakimzadeh, Peiro, Dowling, Scaling HDFS with a Strongly Consistent Relational Model for Metadata, DAIS 2014] 21

Preventing Deadlocks and Starvation read mv /user/jdowling/dna.bam block_report Solution: all request threads for inode operations traverse the FS hierarchy in the same order, acquiring locks in the same order. Block-level operations have to follow the same order. 22

Per Transaction Cache Experimentation revealed many roundtrips to the database per transaction. Cache intermediate transaction results at NameNodes. We also use Memcached at each NameNode to cache mappings of: path->{inode/blocks/replicas}

Sometimes, Transactions Just ain t Enough Large Subtree Operations with millions of Inodes can t be executed in a single Transaction, due to the low timeouts for Transactions (real-time). Subtree Operations: 4-phase Protocol Sacrifices Atomicity, but keeps Isolation and Consistency. Batch operations and multithreading for performance. Failed NameNodes handled transparently. Leases used to handle failed clients. 24

Leader Election using the Database (NDB) We need a leader NameNode to coordinate replication and lease management Use NDB as shared memory for Leader Election. No more Zookeeper, yay! 25

HopsFS Internal Protocol Scalability On 100PB+ clusters, internal protocols make up most of the network traffic for HDFS Block Reporting and Exiting Safe Mode - Batching and work stealing.

HopsFS Write Performance 1 Gbit Network, Nodes: 12-core Xeon X560 @ 2.8 Ghz. 2-Node NDB Cluster. 27

HopsFS Read Performance 1 Gbit Network, Nodes: 12-core Xeon X560 @ 2.8 Ghz. 2-Node NDB Cluster. 28

HopsFS Erasure Coding HDFS 2.x Triple Replication (300%) 2x Replication + XOR (220%) Reed-Solomon (140%)

HopsFS Erasure Coding Data durability with Triple Replication Data durability with Reed-Solomon 30

Comparison with HDFS-RAID

HopsFS Snapshots Read-Only Root-Level Single Snapshot - Support rollback on unsuccessful software upgrades - Prototype developed, ongoing work on integration - Snapshot rollback order-of-growth is O(N)

We did the same for YARN 33

Apache Hadoop Yarn HA/Scaleout Limitations Clients Zookeeper Primary RM Standby RM NM NM NM NM NM The Resource Manager (RM) is a bottleneck. Zookeeper throughput not high enough to persist all RM state Standby resource manager can only recover partial state All running jobs must be restarted. RM state not queryable. 34

Hops Yarn. Client NDB NDB NDB RM RM NM NM NM NM NM The RM is a State-Machine. Almost no session state to manage. Transparent failover working. 35

Hops Yarn FIFO Scheduler Capacity Scheduler Fair Scheduler Distributed Resource Tracker Service (ongoing) Make YARN more interactive (ongoing) - Reduce NodeManager Heartbeat Time 36

Hops-Hadoop NDB NDB NDB NDB NN NN NN RM RM RM HDFS HDFS YARN YARN DN NM DN NM DN NM DN NM DN NM DN NM DN NM DN NM DN NM DN NM DN NM DN NM Exabyte-Scale Hadoop

The Hops Stack Continued 38

Bringing Data People Together Data Owners - Metadata, Ingestion - Non-programmers Data Scientists - Data analysts - Programmers HopsHub Spark Flink Adam Cuneiform Hops-YARN Hops-HDFS Karamel/PaaS 39

Perimeter Security and Multi-Tenancy HopsHub - Project-level RBAC Hadoop trusted proxy - Analytics Plugin Framework Adam, Cuneiform, Spark, Flink, MR - REST APIs Network Isolation Kerberos LIMS Related Hadoop Security Projects Knox, Sentry, Rhino LDAP 40

HopsHub Two-Factor Authentication 41

Projects for Multi-Tenancy; Activity Trails Project Global Activity Trail 42

Project Membership 43

HDFS Files File Browser (Iceberg)

Upload Data Overcome 3 GB browser upload limit Apache Flume Automated Ingestion of Data 45

Run Cuneiform Workflows on YARN FastQ files Align BAM file Variant Calling VCF file (~250 GB) (~10 GB) (~5 MB) Annotate Results 46

Ongoing MSc Projects Realizing the meta-data dream of WinFS - Vangelis Optimizing YARN s Resource Tracker Service (interactive YARN) - Sri Interactive Data Analytics (Zeppelin-EE) - Seckin 47

PaaS support with Chef/Karamel Support for EC2, Vagrant, Bare Metal. 48

Conclusions Hops will be the first European distribution of Hadoop when released. - First beta release coming in Q1 2015 Lots of ideas for future work - Tighter Spark, Flink integration - BiobankCloud support NGS Hadoop Workshop Feb 19-20, Stockholm - Signup at www.biobankcloud.com 49