Exploring Amazon EC2 for Scale-out Applications

Similar documents
MySQL and Virtualization Guide

MySQL performance in a cloud. Mark Callaghan

Variations in Performance and Scalability when Migrating n-tier Applications to Different Clouds

Xen Virtualization: Xen (source) and XenServer


Building a Private Cloud with Eucalyptus

Deep Dive: Maximizing EC2 & EBS Performance

How To Test For Speed On Postgres (Postgres) On A Microsoft Powerbook On A 2.2 Computer (For Microsoft) On An 8Gb Hard Drive (For

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Tushar Joshi Turtle Networks Ltd

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Virtuoso and Database Scalability

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Flash for Databases. September 22, 2015 Peter Zaitsev Percona

MySQL: Cloud vs Bare Metal, Performance and Reliability

Parallel Replication for MySQL in 5 Minutes or Less

Best Practices for Using MySQL in the Cloud

Scaling Graphite Installations

Scaling Database Performance in Azure

Investigation of storage options for scientific computing on Grid and Cloud facilities

An Overview of Flash Storage for Databases

Price/performance Modern Memory Hierarchy

Ground up Introduction to In-Memory Data (Grids)

Performance in a Gluster System. Versions 3.1.x

OTM in the Cloud. Ryan Haney

Introduction 1 Performance on Hosted Server 1. Benchmarks 2. System Requirements 7 Load Balancing 7

POSIX and Object Distributed Storage Systems

Amazon EC2 XenApp Scalability Analysis

DELL TM PowerEdge TM T Mailbox Resiliency Exchange 2010 Storage Solution

BookKeeper. Flavio Junqueira Yahoo! Research, Barcelona. Hadoop in China 2011

A Holistic Model of the Energy-Efficiency of Hypervisors

Input / Ouput devices. I/O Chapter 8. Goals & Constraints. Measures of Performance. Anatomy of a Disk Drive. Introduction - 8.1

Best Practices for Optimizing Your Linux VPS and Cloud Server Infrastructure

Dynamic Resource allocation in Cloud

Benchmarking Hadoop & HBase on Violin

AWS Performance Tuning

Distribution One Server Requirements

Data Centers and Cloud Computing

Windows Server Performance Monitoring

Cisco Prime Home 5.0 Minimum System Requirements (Standalone and High Availability)

Cloud Computing and Amazon Web Services

Deploying and Optimizing SQL Server for Virtual Machines

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

PARALLELS CLOUD STORAGE

Google File System. Web and scalability

EMC Business Continuity for Microsoft SQL Server Enabled by SQL DB Mirroring Celerra Unified Storage Platforms Using iscsi

High Availability Solutions for the MariaDB and MySQL Database

Performance characterization report for Microsoft Hyper-V R2 on HP StorageWorks P4500 SAN storage

Enabling Technologies for Distributed and Cloud Computing

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

Quantifying Hardware Selection in an EnCase v7 Environment

Use of Hadoop File System for Nuclear Physics Analyses in STAR

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Choosing Storage Systems

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

PARALLELS CLOUD SERVER

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

Part V Applications. What is cloud computing? SaaS has been around for awhile. Cloud Computing: General concepts

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

Mark Bennett. Search and the Virtual Machine

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

Configuring Apache Derby for Performance and Durability Olav Sandstå

White paper. QNAP Turbo NAS with SSD Cache

Cloud Based Application Architectures using Smart Computing

MAGENTO HOSTING Progressive Server Performance Improvements

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

Very Large Enterprise Network, Deployment, Users

PES. Batch virtualization and Cloud computing. Part 1: Batch virtualization. Batch virtualization and Cloud computing

Cloud Computing For Bioinformatics

Amazon Cloud Storage Options

Specification and Implementation of Dynamic Web Site Benchmarks. Sameh Elnikety Department of Computer Science Rice University

Amazon Elastic Beanstalk

StorPool Distributed Storage Software Technical Overview

StACC: St Andrews Cloud Computing Co laboratory. A Performance Comparison of Clouds. Amazon EC2 and Ubuntu Enterprise Cloud

SCALABILITY AND AVAILABILITY

<Insert Picture Here> Introducing Oracle VM: Oracle s Virtualization Product Strategy

WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE

ioscale: The Holy Grail for Hyperscale

Storage benchmarking cookbook

Scaling Analysis Services in the Cloud

NoSQL Failover Characteristics: Aerospike, Cassandra, Couchbase, MongoDB

Synchronous multi-master clusters with MySQL: an introduction to Galera

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

SAN TECHNICAL - DETAILS/ SPECIFICATIONS

Fusionstor NAS Enterprise Server and Microsoft Windows Storage Server 2003 competitive performance comparison

PADS GPFS Filesystem: Crash Root Cause Analysis. Computation Institute

Enabling Technologies for Distributed Computing

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

DESKTOP VIRTUALIZATION SIZING AND COST MODELING PROCESSOR MEMORY STORAGE NETWORK

HP Proliant BL460c G7

MySQL High Availability Solutions. Lenz Grimmer OpenSQL Camp St. Augustin Germany

5 Signs You Might Be Outgrowing Your MySQL Data Warehouse*

Investigation of storage options for scientific computing on Grid and Cloud facilities

A Filesystem Layer Data Replication Method for Cloud Computing

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

Transcription:

Exploring Amazon EC2 for Scale-out Applications Presented by, MySQL & O Reilly Media, Inc. Morgan Tocker, MySQL Canada Carl Mercier, Defensio

Introduction! Defensio is a spam filtering web service for blogs and other social web applications.! Powered exclusively by Amazon EC2.! Ruby, Rails, C, MySQL 5.0 (and a few more things).

EC2: Elastic Compute Cloud! Virtual machines running on Xen. These VMs are called instances.! Pay only for what you use.! On demand scaling - controlled with an API.! Instances are disposable.

Instance Types RAM CPU STORAGE IO $ SMALL (1) 1.7 GB 1 virtual core 1 CU (32 bit) 160 GB moderate $0.10 / hour (~ $72 / mo) LARGE (4) 7.5 GB 2 virtual cores 850 GB high $0.40 / hour 64 bit 2 CU each (64 bit) (2 x 420 GB) (~ $288 / mo) XLARGE (8) 15 GB 4 virtual cores 1.7 TB high $0.80 / hour 64 bit 2 CU each (64 bit) (4 x 420 GB) (~ $576 / mo) * One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor. This is also the equivalent to an early-2006 1.7 GHz Xeon.

Topologies

In the beginning... 1*2&#*+3*+&4& 1*2&#*+3*+&!"!"#$%&!'()*+&,&!*-.'./*0&!"#$%&#5'3*& 62'.789(&:&+*9;+<=>?& B=)*+=*)& @1#& #A& CD(E)*&!"#$%&#5'3*& 6;D(E)*&2'.789?&

Adding some room to grow... 1*2&#*+3*+&4& 1*2&#*+3*+&!"!"#$%&#5'3*&!#$" 6+*'0&9*+F;+-'=.*?&!"#$%&!'()*+&,&!*-.'./*0&!"#$%&#5'3*&!" 6+*'0&9*+F;+-'=.*?&!"#$%&#5'3*& 62'.789(&:&+*9;+<=>?& B=)*+=*)& @1#& #A& CD(E)*&!"#$%&#5'3*& 6;D(E)*&2'.789?&

Sharding! The Defensio application shards quite well.! Expecting to be able to split up customers across many servers.! Customers are not created equally. Some servers may only have a few with heavy load, others will have thousands.! Will need to write scripts to rebalance the customers - EC2 makes this easy.

Stepping it up a notch w/ Sharding!"#$%"&'"&$($!"#$%"&'"&$)$!"#$%"&'"&$*$ +*,"-$./%01$%23&,$($./%01$%23&,$)$./%01$%23&,$4$

Stepping it up a notch w/ Sharding!"#$%"&'"&$($!"#$%"&'"&$)$!"#$%"&'"&$*$ +*,"-$./%01$%23&,$($./%01$%23&,$)$./%01$%23&,$4$

Stepping it up a notch w/ Sharding!"#$%"&'"&$($!"#$%"&'"&$)$!"#$%"&'"&$*$ +*,"-$ 567$$%23&,$4$ #8,,/7$./%01$%23&,$($./%01$%23&,$)$./%01$%23&,$4$

Stepping it up a notch w/ Sharding!"#$%"&'"&$($!"#$%"&'"&$)$!"#$%"&'"&$*$ +*,"-$./%01$%23&,$($./%01$%23&,$)$./%01$%23&,$4$

Performance

EC2 Performance! Databases are bound by disk performance.! Initial suspicions of EC2 under performing were confirmed.! It wasn t that straight forward though.

!"#$% &'(% )*+%#*,-*,%

!"#$% &'(% )*+%#*,-*,%

The most basic test $ cd /mnt $ dd if=/dev/zero of=my50gfile bs=1024m count=50! A single consumer drive should offer at least 50M/s.! We re just writing 50G of nothing.! It tests sequential I/O, with a small filesystem overhead.

The most basic test (cont.) $ time dd if=/dev/zero of=my50gfile bs=1024m count=50 50+0 records in 50+0 records out 53687091200 bytes (54 GB) copied, 2309.87 seconds, 23.2 MB/s real 38m30.377s user 0m0.000s sys 0m57.560s

Damn, that sucks.

...what s one of the first rules of benchmarking?

$ time dd if=/dev/zero of=my50gfile bs=1024m count=50 50+0 records in 50+0 records out 53687091200 bytes (54 GB) copied, 504.717 seconds, 106 MB/s real user sys 8m24.982s 0m0.000s 1m24.790s

Explanation! There s a first write penalty for EC2.! It is a limitation in EC2s architecture - all subsequent writes are much faster.! There s no documentation on this anywhere.! Deletes are also slow.

Workaround $ dd if=/dev/zero of=diskfiller.tmpfile bs=1000m count=99999999! This takes just over 5 hours for a 400G stripe on 2 drives.

Now we re past that false start, let s start again!

Raw Disk Speed MB/second 400 300 200 100 0 Small Large X Large! The average speed in writing an 11GB file to /mnt. Large and XLarge instances used software RAID (striping). Full disclosure TBA on http://tocker.id.au/ in April 08.

Bonnie++ Tests 300 225 150 75 0 Small Large X Large Reference #1 Reference #2 Sequential writes (MB/s) Sequential rewrites (MB/s) Sequential reads (MB/s)! All tests performed on /mnt, with software RAID. Reference #1 system was an Athlon XP 2500+ with a single 10,000 RPM SATA disk. Reference #2 was the same system with a single 7200RPM disk. Full disclosure will be on http://tocker.id.au/ in April 08.

Benchmarks! It s easy to lie.! Next step after confirming raw performance is consistent, is to benchmark inside MySQL.! This can be done with Sysbench http://sysbench.sourceforge.net/

Transactions/Second 4 Threads 8 Threads 16 Threads 32 Threads 230.0 172.5 115.0 57.5 0 Small Large X Large Test was OLTP complex and InnoDB tables.

Benchmark Conclusions! Comparable performance to non-virtualmachines.! RAID0 or RAID0+1 with software RAID. Increased risk of failure with more spindles.! Other sources have benchmarked CPU and network performance.

Limitations Yesterday

Limitations! I had wanted to use DRBD / Heartbeat to reduce impact of failure. Can t do it because of possibility of split brains.! Can t swap in another disk subsystem. No Hardware RAID or BBU.! Not so much documentation on hardware.

Limitations (cont.)! Wanted to know if writes persist on disk - not every possible to tell.! For Defensio losing a few rows is annoying, but you would be insane if this were a financial application.

Limitations (cont.)! Amazon seems reasonably friendly about giving impending failure notice.! If a disk in your software raid dies or the network card dies, they re going to make you move off to fix it.! This is going to annoy you.

Limitations (cont.)! S3 has too high latency to mount in FUSE and try and use for persistent storage (not designed for this either).! Planning to use S3 on slave/reports server and push snapshots to it.! Can t increase the size of an instance.

Yeah, Yesterday

Amazon s Announcements! Persistent Storage for Amazon EC2 http://www.allthingsdistributed.com/2008/04/ persistent_storage_for_amazon.html! On the Road to Highly Available EC2 Applications http://www.allthingsdistributed.com/2008/03/ on_the_road_to_highly_availabl.html

Limitations (cont.)! No higher availability instances.! No way to a la carte add storage.! No way to quickly migrate and recover from instance failure.! Not easy to guarantee that an instance was on a different physical node than another instance.

Limitations (cont.)! No higher availability instances.! No way to a la carte add storage.! No way to quickly migrate and recover from instance failure.! Not easy to guarantee that an instance was on a different physical node than another instance.

War Stories

War Stories! Possible bug in Replication (BUG #26489).! Possible memory leak in MySQL (was reasonably elegant just to upgrade and shutdown - can t do with other hosting).

War Stories (cont.)! Degraded Nodes x2.! The usual Replication is asynchronous (plan accordingly) dilemmas.

Conclusions

Conclusions! Good value in Small and Large instances.! For Defensio s architecture might buy two Large rather than one XL machine.! $600/month for XL is quite expensive.

Conclusions (cont.)! Failure rate has been unusually high - probably bad luck.! Might not suit people who fill up the disks on X.Large completely due to time to restore.

Conclusions (cont.)! Does not offer durability (non-acid compliant).! Wish there were more instance types or a way to order features a la carte.! Wish migrating data off a failed node was easier.

The End. Presented by, MySQL & O Reilly Media, Inc. Questions? morgan@mysql.com carl@defensio.com