Quantcast Petabyte Storage at Half Price with QFS!

Similar documents
The Quantcast File System

Design and Evolution of the Apache Hadoop File System(HDFS)

EXPERIMENTATION. HARRISON CARRANZA School of Computer Science and Mathematics

The Google File System

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Hadoop & its Usage at Facebook

Apache Hadoop. Alexandru Costan

Hadoop & its Usage at Facebook

Scala Storage Scale-Out Clustered Storage White Paper

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Apache Hadoop FileSystem and its Usage in Facebook

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Big Fast Data Hadoop acceleration with Flash. June 2013

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

CS54100: Database Systems

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Apache Hadoop FileSystem Internals

Benchmarking Hadoop & HBase on Violin

Actian SQL in Hadoop Buyer s Guide

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Use of Hadoop File System for Nuclear Physics Analyses in STAR

Using Hadoop to Expand Data Warehousing

NextGen Infrastructure for Big DATA Analytics.

Hadoop IST 734 SS CHUNG

Big Data Technology Core Hadoop: HDFS-YARN Internals

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Big + Fast + Safe + Simple = Lowest Technical Risk

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Hadoop: Embracing future hardware

Accelerating and Simplifying Apache

THE HADOOP DISTRIBUTED FILE SYSTEM

NoSQL Data Base Basics

Tableau Server Scalability Explained

MASSIVE DATA PROCESSING (THE GOOGLE WAY ) 27/04/2015. Fundamentals of Distributed Systems. Inside Google circa 2015

Oracle Big Data SQL Technical Update

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012

Extending Hadoop beyond MapReduce

RAID. Tiffany Yu-Han Chen. # The performance of different RAID levels # read/write/reliability (fault-tolerant)/overhead

HDFS. Hadoop Distributed File System

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Data-Intensive Computing with Map-Reduce and Hadoop

Open source Google-style large scale data analysis with Hadoop

Windows Server 2008 R2 Essentials

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

POSIX and Object Distributed Storage Systems

Distributed File Systems

GraySort on Apache Spark by Databricks

Four Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

HDFS Under the Hood. Sanjay Radia. Grid Computing, Hadoop Yahoo Inc.

Big data management with IBM General Parallel File System

Hadoop Architecture. Part 1

How To Scale Myroster With Flash Memory From Hgst On A Flash Flash Flash Memory On A Slave Server

Hadoop and its Usage at Facebook. Dhruba Borthakur June 22 rd, 2009

CDH AND BUSINESS CONTINUITY:

Google File System. Web and scalability

The Rise of Industrial Big Data. Brian Courtney General Manager Industrial Data Intelligence

CSE-E5430 Scalable Cloud Computing Lecture 2

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

Open source large scale distributed data management with Google s MapReduce and Bigtable

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Enterprise Architectures for Large Tiled Basemap Projects. Tommy Fauvell

IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform:

Hadoop Scalability at Facebook. Dmytro Molkov YaC, Moscow, September 19, 2011

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Map Reduce / Hadoop / HDFS

Intro to Map/Reduce a.k.a. Hadoop

Energy Efficient MapReduce

Data Protection Technologies: What comes after RAID? Vladimir Sapunenko, INFN-CNAF HEPiX Spring 2012 Workshop

Big Data Analytics by Using Hadoop

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

Configuration Maximums VMware Infrastructure 3

Recommended hardware system configurations for ANSYS users

Networking in the Hadoop Cluster

Unified Big Data Processing with Apache Spark. Matei

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

GeoGrid Project and Experiences with Hadoop

MinCopysets: Derandomizing Replication In Cloud Storage

Technical Overview Simple, Scalable, Object Storage Software

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Oracle Database In-Memory The Next Big Thing

SQL Server Business Intelligence on HP ProLiant DL785 Server

Traditional v/s CONVRGD

HADOOP PERFORMANCE TUNING

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Benchmarking Cassandra on Violin

Hadoop at Yahoo! Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com

Using RDBMS, NoSQL or Hadoop?

Transcription:

9-131 Quantcast Petabyte Storage at Half Price with QFS Presented by Silvius Rus, Director, Big Data Platforms September 2013

Quantcast File System (QFS) A high performance alternative to the Hadoop Distributed File System (HDFS). Manages multi-petabyte Hadoop workloads with significantly faster I/O than HDFS and uses only half the disk space. Offers massive cost savings to large scale Hadoop users (fewer disks = fewer machines). Production hardened at Quantcast under massive processing loads (multi exabyte). Fully Compatible with Apache Hadoop. 100% Open Source. 2

Quantcast Technology Innovation Timeline Quantcast Measurement Launched Quantcast Advertising Launched Launch QFS Receiving 1TB/day Receiving 10TB/day Receiving 20TB/day Receiving 40TB/day 2006 2007 2008 2009 2010 2011 2012 2013 Processing 1PB/day Processing 10PB/day Processing 20PB/day Started using Hadoop Using and sponsoring KFS Turned off HDFS 3

Architecture Client Implements high level file interface (read/write/delete) On write, RS encodes chunks and distributes stripes to nine chunk servers. On read, collects RS stripes from six chunk servers and recomposes chunk. Client Read/write RS encoded data from/to chunk servers Rack 1 Chunk servers Metaserver Maps /file/paths to chunk ids Manages chunk locations Directs clients to chunk servers Locate or allocate chunks Chunk replication and rebalancing instructions Copy/Recover chunks Chunk servers Chunk Server Handles IO to locally stored 64MB chunks Monitors host file system health Replicates and recovers chunks as metaserver directs Metaserver Rack 2 4

QFS vs. HDFS Broadly comparable feature set, with significant storage efficiency advantages. Feature QFS HDFS Scalable, distributed storage designed for efficient batch processing ü ü Open source ü ü Hadoop compatible ü ü Unix style file permissions ü ü Error Recovery mechanism Reed-Solomon encoding Multiple data copies Disk space required (as a multiple of raw data) 1.5x 3x 5

Reed-Solomon Error Correction Leveraging high-speed modern networks HDFS optimizes toward data locality for older networks. 1. Break original data into 64K stripes. Reed-Solomon Parallel Data I/O 10Gbps networks are now common, making disk I/O a more critical bottleneck. QFS leverages faster networks to achieve better parallelism and encoding efficiency. Result: higher error tolerance, faster performance, with half the disk space. 2. Reed-Solomon generates three parity stripes for every six data strips 3. Write those to nine different drives. 4. Up to three stripes can become unreadable... 5. yet the original data can still be recovered Every write parallelized across 9 drives, every read across 6 6

MapReduce on 6+3 Erasure Coded Files versus 3x Replicated Files Positives Negatives Writing is ½ off, both in terms of space and time Any 3 broken or slow devices will be tolerated vs. any 2 with 3-way replication Re-executed stragglers run faster due to reading from multiple devices (striping) There is no locality, reading will require the network On read failure, recovery is needed however it s lightning fast on modern CPUs (2 GB/s per core) Writes don t achieve network line rate as original + parity data is written by a single client 7

Read/Write Benchmarks End-to-end time (minutes) 18 16 14 12 10 8 6 HDFS 64 MB HDFS 2.5 GB QFS 64 MB Host network behavior during tests QFS write = ½ disk I/O of HDFS write QFS write à network/disk = 8/9 HDFS write à network/disk = 6/9 QFS read à network/disk = 1 HDFS read à network/disk = very small 4 2 0 Write Read End-to-end 20 TB write test End-to-end 20 TB read test 8,000 workers * 2.5 GB each Tests ran as Hadoop MapReduce jobs 8

Metaserver Performance Intel E5-2670 64 GB RAM 70 million directories stat rmdir mkdir ls QFS HDFS 0 50 100 150 200 250 300 Operations per second (thousands) 9

Production Hardening for Petascale Continuous I/O Balancing Optimization Operations Full feedback loop Metaserver knows the I/O queue size of every device Activity biased towards under-loaded chunkservers Direct I/O = short loop Direct I/O and fixed buffer space = predictable RAM and storage device usage C++, own memory allocation and layout Vector instructions for Reed Solomon coding Hibernation Evacuation through recovery Continuous space/ consistency rebalancing Monitoring and alerts 10

Use Case Quantsort: All I/O over QFS http://qc.st/qcquantsort Concurrent append. 10,000 writers append to same file at once. Largest sort = 1 PB Daily = 1 to 2 PB, max = 3 PB 11

Use Case Fast Broadcast through Wide Striping 100.0 90.0 94.5 80.0 Broadcast Time (s) 70.0 60.0 50.0 40.0 30.0 20.0 16.7 10.0 8.5 4.8 0.0 HDFS Default HDFS Small Blocks QFS on Disk QFS in RAM Configuration 12

Refreshingly Fast Command Line Tool hadoop fs -ls / versus qfs ls / 800 700 600 500 400 300 200 100 0 700 HDFS Time (msec) 7 QFS Time (msec) 13

How Well Does It Work Reliable at Scale Hundreds of days of metaserver uptime common Quantcast MapReduce sorter uses QFS as distributed virtualized store instead of local disk 8 petabytes of compressed data Close to 1 billion chunks 7,500 I/O devices 14

How Well Does It Work Reliable at Scale Fast and Large Hundreds of days of metaserver uptime common Quantcast MapReduce sorter uses QFS as distributed virtualized store instead of local disk 8 petabytes of compressed data Close to 1 billion chunks 7,500 I/O devices Ran petabyte sort last weekend. Direct I/O not hurting fast scans: Sawzall query performance similar to Presto: Presto/ HDFS Turbo/ QFS Seconds 16 16 Rows 920 M 970 M Bytes 31 G 294 G Rows/sec 57.5 M 60.6 M Bytes/sec 2.0 G 18.4 G 15

How Well Does It Work Reliable at Scale Fast and Large Easy to Use Hundreds of days of metaserver uptime common Quantcast MapReduce sorter uses QFS as distributed virtualized store instead of local disk 8 petabytes of compressed data Close to 1 billion chunks 7,500 I/O devices Ran petabyte sort last weekend. Direct I/O not hurting fast scans: Sawzall query performance similar to Presto: Presto/ HDFS Turbo/ QFS Seconds 16 16 Rows 920 M 970 M Bytes 31 G 294 G Rows/sec 57.5 M 60.6 M Bytes/sec 2.0 G 18.4 G 1 Ops Engineer for QFS and MapReduce on 1,000+ node cluster Neustar set up multi petabyte instance without help from Quantcast Migrate from HDFS using hadoop distcp Hadoop MapReduce just works on QFS 16

Metaserver Statistics in Production QFS metaserver statistics over Quantcast production file systems in July 2013. High Availability is nice to have but not a must-have for MapReduce. There are certainly other use cases where High Availability is a must. Federation may be needed to support file systems beyond 10 PB, depending on file size 17

Who will find QFS valuable? Likely to benefit from QFS May find HDFS a better fit Existing Hadoop users with large-scale data clusters. Data heavy, tech savvy organizations for whom performance and efficient use of hardware are high priorities. Small or new Hadoop deployments, as HDFS has been deployed in a broader variety of production environments. Clusters with slow or unpredictable network connectivity. Environments needing specific HDFS features such as head node federation or hot standby. 18

Summary Key Benefits of QFS Delivers stable high performance alternative to HDFS in a production-hardened 1.0 release Offers high performance management of multi-petabyte workloads Faster I/O than HDFS with half the disk space. Fully Compatible with Apache Hadoop 100% Open Source Quantcast 2012

Future Work What QFS Doesn t Have Just Yet Kerberos Security under development HA No strong case at Quantcast, but nice to have Federation Not a strong case either at Quantcast Contributions welcome Quantcast 2012

Thank You. Questions? Download QFS for free at: github.com/quantcast/qfs San Francisco 201 Third Street San Francisco, CA 94103 New York 432 Park Avenue South New York, NY 10016 London 48 Charlotte Street London, W1T 2NS Quantcast File System 9-13 Quantcast 2012 21