Yahoo! Cloud Serving Benchmark

Similar documents

Benchmarking and Analysis of NoSQL Technologies

Benchmarking Cloud Serving Systems with YCSB

Benchmarking Cloud Serving Systems with YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

LARGE-SCALE DATA STORAGE APPLICATIONS

Cloud Data Yahoo!

Can the Elephants Handle the NoSQL Onslaught?

Benchmarking Cassandra on Violin

NoSQL Data Base Basics

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Benchmarking Challenges with Big Data and Cloud Services. Raghu Ramakrishnan Cloud Information Services Lab (CISL) Microsoft

Performance Evaluation of NoSQL Systems Using YCSB in a resource Austere Environment

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

Scalable Architecture on Amazon AWS Cloud

MASTER PROJECT. Resource Provisioning for NoSQL Datastores

Benchmarking Replication in NoSQL Data Stores

Structured Data Storage

Benchmarking Couchbase Server for Interactive Applications. By Alexey Diomin and Kirill Grigorchuk

Getting Started with SandStorm NoSQL Benchmark

Comparing Scalable NOSQL Databases

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Variations in Performance and Scalability when Migrating n-tier Applications to Different Clouds

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

PARALLELS CLOUD SERVER

Benchmarking the Availability and Fault Tolerance of Cassandra

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

The Quest for Extreme Scalability

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB

Benchmarking Failover Characteristics of Large-Scale Data Storage Applications: Cassandra and Voldemort

Benchmarking Hadoop & HBase on Violin

RDBMS in the Cloud: Oracle Database on AWS

Bigtable is a proven design Underpins 100+ Google services:

The Total Cost of (Non) Ownership of a NoSQL Database Cloud Service

Liferay Portal s Document Library: Architectural Overview, Performance and Scalability

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Communication System Design Projects

PostgreSQL Performance Characteristics on Joyent and Amazon EC2

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Accelerating Big Data: Using SanDisk SSDs for MongoDB Workloads

MySQL performance in a cloud. Mark Callaghan

Performance And Scalability In Oracle9i And SQL Server 2000

Active Cloud DB: A RESTful Software-as-a-Service for Language Agnostic Access to Distributed Datastores

ZooKeeper. Table of contents

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads

Best Practices for Deploying SSDs in a Microsoft SQL Server 2008 OLTP Environment with Dell EqualLogic PS-Series Arrays

Scaling in the Cloud with AWS. By: Eli White (CTO & mojolive) eliw.com - mojolive.com

Benchmarking Scalability and Elasticity of Distributed Database Systems

So What s the Big Deal?

NoSQL Systems for Big Data Management

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra

Case Study - I. Industry: Social Networking Website Technology : J2EE AJAX, Spring, MySQL, Weblogic, Windows Server 2008.

Performance Analysis for NoSQL and SQL

Accelerating Cassandra Workloads using SanDisk Solid State Drives

Benchmarking Top NoSQL Databases Apache Cassandra, Couchbase, HBase, and MongoDB Originally Published: April 13, 2015 Revised: May 27, 2015

Big Data & Cloud. 4 th European Summit on the Future Internet. António Miguel Ferreira, CEO, Lunacloud. Aveiro, 13 to 14th June 2013

Introduction to Big Data Training

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Building a Private Cloud with Eucalyptus

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda

In Memory Accelerator for MongoDB

Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems

C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection

Fixed Price Website Load Testing

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

High Availability for Database Systems in Cloud Computing Environments. Ashraf Aboulnaga University of Waterloo

Scaling Pinterest. Yash Nelapati Ascii Artist. Pinterest Engineering. Saturday, August 31, 13

STeP-IN SUMMIT June 2014 at Bangalore, Hyderabad, Pune - INDIA. Performance testing Hadoop based big data analytics solutions

Entering the Zettabyte Age Jeffrey Krone

Liferay Portal Performance. Benchmark Study of Liferay Portal Enterprise Edition

Can High-Performance Interconnects Benefit Memcached and Hadoop?

On defining metrics for elasticity of cloud databases

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Configuration Manual Yahoo Cloud System Benchmark (YCSB) 24-Mar-14 SEECS-NUST Faria Mehak

How To Choose Between A Relational Database Service From Aws.Com

References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

Introduction to Database Systems CSE 444

EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications

Which NoSQL Database? A Performance Overview

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Measuring Elasticity for Cloud Databases

Introduction to Apache Cassandra

Serving 4 million page requests an hour with Magento Enterprise

CloudCmp:Comparing Cloud Providers. Raja Abhinay Moparthi

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, XLDB Conference at Stanford University, Sept 2012

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012

Hypertable Architecture Overview

How To Scale Out Of A Nosql Database

Network Infrastructure Services CS848 Project

Virtuoso and Database Scalability

MADOCA II Data Logging System Using NoSQL Database for SPring-8

Introduction 1 Performance on Hosted Server 1. Benchmarks 2. System Requirements 7 Load Balancing 7

Big Data and Data Science: Behind the Buzz Words

A Performance Analysis of Distributed Indexing using Terrier

EXPERIMENTAL EVALUATION OF NOSQL DATABASES

Challenges for Data Driven Systems

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Transcription:

Yahoo! Cloud Serving Benchmark Overview and results March 31, 2010 Brian F. Cooper cooperb@yahoo-inc.com Joint work with Adam Silberstein, Erwin Tam, Raghu Ramakrishnan and Russell Sears System setup and tuning assistance from members of the Cassandra and HBase committers, and the Sherpa engineering team 1

Versions of this deck V4.1 Original set of results from benchmark V4.2 added Cassandra 0.5 versus 0.4.2 comparison, Cassandra range query results, and vary scan size results V4.3 Added more scalability data points, and Sherpa elasticity data V4.4 Complete results from final YCSB paper 2

Motivation There are many cloud DB and nosql systems out there Sherpa/PNUTS BigTable HBase, Hypertable, HTable Megastore Azure Cassandra Amazon Web Services S3, SimpleDB, EBS CouchDB Voldemort Dynomite Etc: Tokyo, Redis, MongoDB How do they compare? Feature tradeoffs Performance tradeoffs Not clear! 3

Goal Implement a standard benchmark Evaluate different systems on common workloads Focus on performance and scale out Future additions availability, replication Artifacts Open source workload generator Experimental study comparing several systems 4

Benchmark tool Java application Many systems have Java APIs Other systems via HTTP/REST, JNI or some other solution Command-line parameters DB to use Target throughput Number of threads Workload parameter file R/W mix Record size Data set YCSB client Workload executor Client threads Stats DB client Cloud DB Extensible: define new workloads Extensible: plug in new clients 5

Workloads Workload particular combination of workload parameters, defining one workload Defines read/write mix, request distribution, record size, Two ways to define workloads: Adjust parameters to an existing workload (via properties file) Define a new kind of workload (by writing Java code) Experiment running a particular workload on a particular hardware setup to produce a single graph for 1 or N systems Example vary throughput and measure latency while running a workload against Cassandra and HBase Workload package A collection of related workloads Example: CoreWorkload a set of basic read/write workloads 6

Benchmark tiers Tier 1 Performance For constant hardware, increase offered throughput until saturation Measure resulting latency/throughput curve Sizeup in Wisconsin benchmark terminology Tier 2 Scalability Scaleup Increase hardware, data size and workload proportionally. Measure latency; should be constant Elastic speedup Run workload against N servers; while workload is running att N+1th server; measure timeseries of latencies (should drop after adding server) 7

Test setup Setup Six server-class machines 8 cores (2 x quadcore) 2.5 GHz CPUs, 8 GB RAM, 6 x 146GB 15K RPM SAS drives in RAID 1+0, Gigabit ethernet, RHEL 4 Plus extra machines for clients, routers, controllers, etc. Cassandra 0.5.0 (0.6.0-beta2 for range queries) HBase 0.20.3 MySQL 5.1.32 organized into a sharded configuration Sherpa 1.8 with MySQL 5.1.24 No replication; force updates to disk (except HBase, which primarily commits to memory) Workloads 120 million 1 KB records = 20 GB per server Reads retrieve whole record; updates write a single field 100 or more client threads Caveats Write performance would be improved for Sherpa, sharded MySQL and Cassandra with a dedicated log disk We tuned each system as well as we knew how, with assistance from the teams of developers 8

Workload A Update heavy 50/50 Read/update Workload A - Read latency Workload A - Update latency 70 80 Average read latency (ms) 60 50 40 30 20 10 Update latency (ms) 70 60 50 40 30 20 10 0 0 5000 10000 15000 Throughput (ops/sec) 0 0 5000 10000 15000 Throughput (ops/sec) Cassandra Hbase Sherpa MySQL Cassandra Hbase Sherpa MySQL Comment: Cassandra is optimized for writes, and achieves higher throughput and lower latency. Sherpa and MySQL achieve roughly comparable performance, as both are limited by MySQL s capabilities. HBase has good write latency, because of commits to memory, and somewhat higher read latency, because of the need to reconstruct records. 9

95/5 Read/update Workload B Read heavy Workload B - Read latency Workload B - Update latency 20 40 Average read latency (ms) 18 16 14 12 10 8 6 4 2 0 0 2000 4000 6000 8000 10000 Throughput (operations/sec) Average update latency (ms) 35 30 25 20 15 10 5 0 0 2000 4000 6000 8000 10000 Throughput (operations/sec) Cassandra HBase Sherpa MySQL Cassandra Hbase Sherpa MySQL Comment: Sherpa does very well here, with better read latency only one lookup into a B- tree is needed for reads, unlike log-structured systems where records must be reconstructed. Cassandra also performs well, matching Sherpa until high throughputs. HBase does well also, although read time is higher. 10

Workload E short scans Scans of 1-100 records of size 1KB 120 Workload E - Scan latency Average scan latency (ms) 100 80 60 40 20 0 0 200 400 600 800 1000 1200 1400 1600 Throughput (operations/sec) Hbase Sherpa Cassandra Comment: HBase and Sherpa are roughly equivalent for latency and peak throughput, even though HBase is meant for scans. Cassandra s performance is poor, but the development team notes that many optimizations still need to be done. 11

Workload E range size Vary size of range scans Comment: For small ranges, queries are similar to random lookups; Sherpa is efficient for random lookups and does well. As range increases, HBase begins to perform better since it is optimized for large scans 12

Scale-up Read heavy workload with varying hardware Comment: Sherpa and Casandra scale well, with flat latency as system size increases. HBase is very unstable; 3 servers or less performs very poorly. 13

Elasticity Run a read-heavy workload on 2 servers; add a 4 th, then 5 th, then 6 th server. 140 Sherpa Elasticity - 5th to 6th server 120 Read latency (ms) 100 80 60 40 20 0 0 20 40 60 80 100 120 140 Duration of test (min) Comment: Sherpa shows variance in response time as tablets are moving, but after the data moves, it settles into an average that is faster than before the sixth server was added. 14

Elasticity Run a read-heavy workload on 2 servers; add a 4 th, then 5 th, then 6 th server. 800 700 600 Cassandra Elasticity 5 th to 6 th server Read latency (ms) 500 400 300 200 100 0 0 50 100 150 200 250 300 350 Duration of test (min) Comment: Cassandra shows lots of latency variance as it moves data to the new server, and takes multiple hours to stabilize 15

Elasticity Run a read-heavy workload on 2 servers; add a 4 th, then 5 th, then 6 th server. 250 HBase Elasticity - 5th to 6th Server 200 Read latency (ms) 150 100 50 0 0 5 10 15 20 25 30 Test duration (min) Comment: HBase shows a small latency bump as the cluster reconfigures. But data is not moved to the new server until a compaction is performed (not shown in the graph) 16

Cassandra 0.4.2 vs 0.5 17

Cassandra 0.4.2 vs 0.5 18

For more information Contact: Brian Cooper (cooperb@yahoo-inc.com) Detailed writeup of benchmark: http://www.brianfrankcooper.net/pubs/ycsb.pdf Open source YCSB tool coming soon (watch http://research.yahoo.com/ Web_Information_Management/YCSB) 19