Designing Apps for Amazon Web Services

Similar documents

Scalable Architecture on Amazon AWS Cloud

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Amazon Elastic Beanstalk

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity

Amazon EC2 Product Details Page 1 of 5

Scalable Application. Mikalai Alimenkou

Architecting For Failure Why Cloud Architecture is Different! Michael Stiefel

Storage and Disaster Recovery

EEDC. Scalability Study of web apps in AWS. Execution Environments for Distributed Computing

Web Application Hosting in the AWS Cloud Best Practices

Scaling in the Cloud with AWS. By: Eli White (CTO & mojolive) eliw.com - mojolive.com

Cloud Computing For Bioinformatics

Web Application Hosting in the AWS Cloud Best Practices

Using ArcGIS for Server in the Amazon Cloud

Amazon Cloud Storage Options

Preparing Your IT for the Holidays. A quick start guide to take your e-commerce to the Cloud

WE RUN SEVERAL ON AWS BECAUSE WE CRITICAL APPLICATIONS CAN SCALE AND USE THE INFRASTRUCTURE EFFICIENTLY.

Fault-Tolerant Computer System Design ECE 695/CS 590. Putting it All Together

MySQL: Cloud vs Bare Metal, Performance and Reliability

Cloud Computing with Amazon Web Services and the DevOps Methodology.

Software- as- a- Service (SaaS) on AWS Business and Architecture Overview

How AWS Pricing Works May 2015

AWS Performance Tuning

How AWS Pricing Works

Storage Solutions in the AWS Cloud. Miles Ward Enterprise Solutions Architect

High-Availability in the Cloud Architectural Best Practices

Amazon Web Services Primer. William Strickland COP 6938 Fall 2012 University of Central Florida

Amazon Web Services Yu Xiao

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

OTM in the Cloud. Ryan Haney

ArcGIS for Server in the Amazon Cloud. Michele Lundeen Esri

EXECUTIVE SUMMARY CONTENTS. 1. Summary 2. Objectives 3. Methodology and Approach 4. Results 5. Next Steps 6. Glossary 7. Appendix. 1.

AMAZON S3: ARCHITECTING FOR RESILIENCY IN THE FACE OF FAILURES Jason McHugh

An Introduction to Cloud Computing Concepts

How To Choose Between A Relational Database Service From Aws.Com

Amazon Web Services Student Tutorial

Drupal in the Cloud. Scaling with Drupal and Amazon Web Services. Northern Virginia Drupal Meetup

Primex Wireless OneVue Architecture Statement

Last time. Today. IaaS Providers. Amazon Web Services, overview

A Comparison of Clouds: Amazon Web Services, Windows Azure, Google Cloud Platform, VMWare and Others (Fall 2012)

Design for Failure High Availability Architectures using AWS

TECHNOLOGY WHITE PAPER Jun 2012

Introduction to Amazon Web Services! Leo Senior Solutions Architect

Cloud Models and Platforms

Cloud Computing and Amazon Web Services

Deploying for Success on the Cloud: EBS on Amazon VPC. Phani Kottapalli Pavan Vallabhaneni AST Corporation August 17, 2012

Building Success on Acquia Cloud:

Multi-Datacenter Replication

Using ArcGIS for Server in the Amazon Cloud

Every Silver Lining Has a Vault in the Cloud

Expand Your Infrastructure with the Elastic Cloud. Mark Ryland Chief Solutions Architect Jenn Steele Product Marketing Manager

Enabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings

Overview and Deployment Guide. Sophos UTM on AWS

Building Fault-Tolerant Applications on AWS October 2011

Jive and High-Availability

PostgreSQL on Amazon. Christophe Pettus PostgreSQL Experts, Inc.

Testing Cloud Application System Resiliency by Wrecking the System

Cloud Computing Disaster Recovery (DR)

Amazon Compute - EC2 and Related Services

Last time. Today. IaaS Providers. Amazon Web Services, overview

TECHNOLOGY WHITE PAPER Jan 2016

Financial Services Grid Computing on Amazon Web Services January 2013 Ian Meyers

Red Hat Enterprise linux 5 Continuous Availability

Jinesh Varia Technology Evangelist Architectural Design Patterns in Cloud Computing

Migration Scenario: Migrating Backend Processing Pipeline to the AWS Cloud

Last time. Today. IaaS Providers. Amazon Web Services, overview

An Esri White Paper January 2011 Estimating the Cost of a GIS in the Amazon Cloud

There Are Clouds In Your Future. Jeff Barr Amazon Web (Twitter)

THE SUMMARY. ARKSERIES - pg. 3. ULTRASERIES - pg. 5. EXTREMESERIES - pg. 9

AWS Storage: Minimizing Costs While Retaining Functionality

ArcGIS 10.3 Server on Amazon Web Services

INDIA September 2011 virtual techdays

Service Organization Controls 3 Report

NetflixOSS A Cloud Native Architecture

SQL Server Virtualization 101. David Klee, Group Principal and Practice Lead. SQL PASS Virtualization VC,

Backup & Replication. Description. Use Cases

Preparation Guide. How to prepare your environment for an OnApp Cloud v3.0 (beta) deployment.

Shadi Khalifa Database Systems Laboratory (DSL)

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

CONNECTRIA MANAGED AMAZON WEB SERVICES (AWS)

Scalable Web Application

RAID Utility User Guide. Instructions for setting up RAID volumes on a computer with a Mac Pro RAID Card or Xserve RAID Card

OmniCube. SimpliVity OmniCube and Multi Federation ROBO Reference Architecture. White Paper. Authors: Bob Gropman

Proposal for Virtual Private Server Provisioning

AVLOR SERVER CLOUD RECOVERY

RAID Utility User s Guide Instructions for setting up RAID volumes on a computer with a MacPro RAID Card or Xserve RAID Card.

Making Your ColdFusion Apps Highly Available. Brian Klaas Johns Hopkins Bloomberg School of Public Health

Intro to AWS: Storage Services

Estimating the Cost of a GIS in the Amazon Cloud. An Esri White Paper August 2012

Relocating Windows Server 2003 Workloads

High Performance MySQL Choices in Amazon Web Services: Beyond RDS. Andrew Shieh, SmugMug Operations smugmug.

Deploying Splunk on Amazon Web Services

How To Use Arcgis For Free On A Gdb (For A Gis Server) For A Small Business

Transcription:

Designing Apps for Amazon Web Services Mathias Meyer, GOTO Aarhus 2011 Montag, 10. Oktober 11

Montag, 10. Oktober 11

Me infrastructure code databases @roidrage www.paperplanes.de Montag, 10. Oktober 11

Montag, 10. Oktober 11 Unlimited resources, whenever you need them The Cloud

Montag, 10. Oktober 11

Montag, 10. Oktober 11

Montag, 10. Oktober 11

Montag, 10. Oktober 11 10,000 feet

Amazon Web Services Montag, 10. Oktober 11

Montag, 10. Oktober 11 EC2

On-demand computing Montag, 10. Oktober 11 Utility computing

Montag, 10. Oktober 11 It s not a cloud if it doesn t have an API. API

Montag, 10. Oktober 11 Pay as you go

Multiple regions Montag, 10. Oktober 11 Geographically distributed, for all web services. US East, West, EU, Singapore, Japan

Multiple datacenters Montag, 10. Oktober 11 Called availability zones. At least two data centers in each region. Physically separated locations. API endpoint for a region is unspecific for availability zones

Different instance types Montag, 10. Oktober 11

High CPU vs. High memory Montag, 10. Oktober 11 One CPU core is about the power of a 2007 Xeon processor.

1.7 GB - 68 GB Montag, 10. Oktober 11

Elastic Block Store Montag, 10. Oktober 11 Storage on instances is ephemeral. Goes away when the instance goes away. EBS allows persisting data for longer than an instance s lifetime. Bound to a data center.

Mount to any instance Montag, 10. Oktober 11 A big number of block volumes can be mounted to a single instance.

Montag, 10. Oktober 11 A point in time, atomic snapshot of an EBS volume. Not a reliable means of backup, but one means for backups. Snapshots

More AWS Products S3 CloudFront CloudFormation CloudWatch RDS Auto Scaling SimpleDB Route 53 Load Balancing Queue Service Notification Service Elastic MapReduce Montag, 10. Oktober 11

Montag, 10. Oktober 11 What s Scalarium?

Automates: Setup Configuration One-Click Deploy Montag, 10. Oktober 11

...for EC2...on EC2 Montag, 10. Oktober 11 Scalarium uses a couple of Amazon s Web services EC2 is the most interesting one Own monitoring, because customers like saving money, CloudWatch costs money

Montag, 10. Oktober 11 Scalarium automates so that customers don t have to. Most important part about deploying in the cloud. Every manual change is lost when an instance goes down. Automation

The Dream: Configure a cluster Push a button Boom! Montag, 10. Oktober 11

Montag, 10. Oktober 11 Two ways...

Montag, 10. Oktober 11 Create image, boot it.

Build once, use forever Montag, 10. Oktober 11 Point in time snapshot of a system, fully configured. App server, database, web server, etc.

How do you handle updates? Montag, 10. Oktober 11 Build a new image, install updates, discard old image. Lather, rinse, repeat, with every update.

Configure from scratch Montag, 10. Oktober 11

Montag, 10. Oktober 11 Start with a blank slate. A clean operating system installation.

Montag, 10. Oktober 11 Use a configuration management tool. Abstracts installation of packages, writing of configuration files, handling file systems, etc.

Configuration + Cookbooks/Manifests + Chef/Puppet/etc. = Configured cluster Montag, 10. Oktober 11 Configuration describes the final result of what the system should look like.

Configuration: Chef Server RightScale JSON Scalarium Montag, 10. Oktober 11 There s an infinite numbers of infrastructure service providers. Just as many ways to store cluster configuration

Montag, 10. Oktober 11 A Web UI to automate and simplify the Amazon APIs and setting up servers/clusters.

Montag, 10. Oktober 11 In the beginning...

Montag, 10. Oktober 11 Lasted for a few months on just one instance. Instance ran RabbitMQ, CouchDB, Redis, background workers, web and application servers. Bootstrapped startup = start small, iterate quickly. Scalarium

Montag, 10. Oktober 11 Eventually became overloaded.

Montag, 10. Oktober 11

Montag, 10. Oktober 11

Automate, automate, automate! Montag, 10. Oktober 11 Chef for everything. Yes, our first isntance was not fully automated. Hypocratic. Vagrant is an excellent tool to test locally. Scalarium customers can test their own and our cookbooks locally. Automate setup, configuration, re-configuration of services, everything!

EC2 is not a traditional datacenter Montag, 10. Oktober 11

Montag, 10. Oktober 11 It doesn t look like this.

Montag, 10. Oktober 11 Still looks like this. But it s transparent to you. No operational access to you.

Multi-tenant Montag, 10. Oktober 11 Shared resources (CPU, memory, network). You instance likely shares resources with several other EC2 customers on the same physical host.

High likelihood of failure Montag, 10. Oktober 11 It s still hardware that s running your servers. Hardware fails.

Faulty instances Montag, 10. Oktober 11 They fail, not all the time, but if you have high turnover scaling up and down, they ll fail. Discard, boot new instance, done.

Montag, 10. Oktober 11 Datacenter outage

Montag, 10. Oktober 11 Network partition

More instances = Higher chance of failure Montag, 10. Oktober 11

MTBF Montag, 10. Oktober 11 Mean Time Between Failures On EC2 as a whole it s pretty small. Not an important metric. Just because something fails doesn t mean you have to be affected.

Montag, 10. Oktober 11 US-East data centers become unavailable. Cascading failures in EBS storage. Recovery took four days. 21/04/2011

Montag, 10. Oktober 11 The big chance for nay-sayers and cloud haters.

Montag, 10. Oktober 11 The big chance for nay-sayers and cloud haters.

Montag, 10. Oktober 11 The big chance for nay-sayers and cloud haters.

Montag, 10. Oktober 11 The big chance for nay-sayers and cloud haters.

Montag, 10. Oktober 11 The big chance for nay-sayers and cloud haters.

Montag, 10. Oktober 11 Downtime in EU data centers. Lightning strike caused power outage. Again, cascading failure in the EBS storage layer. More than three days til full recovery. 7/8/2011

Montag, 10. Oktober 11

Montag, 10. Oktober 11

Failure is a good thing Montag, 10. Oktober 11

Montag, 10. Oktober 11 You can ignore it

Montag, 10. Oktober 11 Learn from it

Montag, 10. Oktober 11 Design for it

Montag, 10. Oktober 11 Don t fear failure

Plan for failure Montag, 10. Oktober 11 Failure becomes a part of your apps lifecycle. Deploying in the cloud has a bigger effect on culture than it does on your application.

Design for resilience Montag, 10. Oktober 11 In case of failures, you app should handle them gracefully, not breaking along the way entirely. Serve statics instead of failure notices to the user.

Plan for recovery Montag, 10. Oktober 11 Can you re-deploy your app into a different region quickly? If not, why not?

Montag, 10. Oktober 11 MTTR

Disaster recovery plan Montag, 10. Oktober 11 What do you do when your site goes down? What do you do when you need to restore data? Plan, verify, one click.

Multi-datacenter deployments Montag, 10. Oktober 11 Deploy App across multiple data centers/availability zones. Make deploying to different data centers part of the deployment process. Staged deployments: new set of instance, flip load balancer.

Montag, 10. Oktober 11 Storage becomes a key part in handling failure. Everything else is usually much easier to scale and distribute. Replicate data across availability zones, across regions. Replication

Montag, 10. Oktober 11 Availability zones are geographically distributed Reading from in between them means increased latency Replication ensures data is in multiple geographic locations. Replication allows to recover quickly by moving to different data centers. Not all databases do this well, but they do it

Multi-region deployments Montag, 10. Oktober 11 If you need to be very highly available.

Montag, 10. Oktober 11 Deploying highly distributed is expensive. How distributed is up to your budget. And to how much your availability is worth to your business. $$$

Relax consistency requirements Montag, 10. Oktober 11 Strong consistency increases need for full availability Distribute and partition data, on different instances and different datacenters

Latency Montag, 10. Oktober 11 Immediately became an issue when we scaled out. Network latency adds to EBS latency and made for higher response times from the database. All network traffic on EC2 is firewalled, even internal traffic.

Montag, 10. Oktober 11 Keep data local

Montag, 10. Oktober 11

Montag, 10. Oktober 11 Keep data in memory

Montag, 10. Oktober 11 Disk is expensive because it touches the network. Cache is king

Use RAIDs for EBS Montag, 10. Oktober 11 Increased performance, better network utilization EBS performance is okay, but not great. Don t expect SATA or SAS like performance. RAID 0, 5 or 10. EBS is redundant, but extra reduncany with striping doesn t hurt. More likely recovery when one EBS volume fails. RAIDs won t save you from EBS unavailability.

Use local storage Montag, 10. Oktober 11 Don t use EBS at all. Local storage requires redundancy. Instance storage is lost when the instance is lost. RAID across local storage. More reliable in terms of I/O than EBS. Services that uses local storage where mostly unaffected by the EC2 outages.

Use bigger instances Montag, 10. Oktober 11 The bigger your instance the less shared it is on the host. Bigger instances have higher I/O throughput.

What would I do differently? Montag, 10. Oktober 11

Small services Montag, 10. Oktober 11 Small services can run independent of each other. Small services are easy to deploy, easy to reconfigure (Chef). Don t have to know about all the other services upfront, leave that to CM tools. Layered system with small services allows failure handling on every layer. Failure in one layer doesn t have to drag down the rest.

Frontend vs. Small APIs Montag, 10. Oktober 11

Fail fast Montag, 10. Oktober 11 When components fail, don t block waiting for them. Timeout quickly. Circuit breaker: track failures and fail operations immediately if you know they re likely to fail. recover when it s safe again.

Montag, 10. Oktober 11 Retry with an exponential backoff. Assume failure always. Retry

Don t just assume failure Montag, 10. Oktober 11

Test failure Montag, 10. Oktober 11 Shut off instances randomly, see what happens. Turn on the firewall, adds network timeouts, see what happens. The cloud makes it so easy to bring up test environments, and to move resources quickly when necessary.

Montag, 10. Oktober 11 Netflix Chaos Monkey randomly kills instances.

Think about your software running. Theo Schlossnagle, OmniTI Montag, 10. Oktober 11

Understand your code s breaking points Montag, 10. Oktober 11 Use patterns like circuit breakers and bulkheads to reduce failure points Think about outcome and implications, not just features. Understand your code s breaking points and how they handle unavailability, timeouts, and the like. All these are so much more likely in a cloud environment.

Isn t all that what you do at large scale? Montag, 10. Oktober 11 It s what you do at any scale where availability is a factor.

Cloud == Large scale Montag, 10. Oktober 11

You re a part of it Montag, 10. Oktober 11 Prepare for the worst, plan for the worst. The cloud made failure at a large scale obvious even when you re working at a small scale.

Montag, 10. Oktober 11 Scalarium today

Scalarium runs on Scalarium Montag, 10. Oktober 11

Montag, 10. Oktober 11 Easy to boot up somewhere else, switch over DNS, done.

Montag, 10. Oktober 11

Montag, 10. Oktober 11

Lack of visibility Montag, 10. Oktober 11 Takes up to an hour still to acknowledge problems. Amazon is not good at admitting failure happens a lot on EC2. Not enough education on how to build apps for EC2 and their web services, especially how to deal with failure.

Montag, 10. Oktober 11 Don t fall for SLAs

Amazon only handles infrastructure Montag, 10. Oktober 11

How you build on it is up to you Montag, 10. Oktober 11

Montag, 10. Oktober 11 Fun fact

amazon.com is served off EC2 Montag, 10. Oktober 11 Since 21/10/2010. Yes, they were down too, at least during the EU outage.

It s not the cloud that matters, it s how you use it. Montag, 10. Oktober 11 EC2 doesn t make everything harder, quite the opposite, it makes things easier: Adding capacity, automation, responding to failures.

Montag, 10. Oktober 11 Thank you!