NetflixOSS A Cloud Native Architecture

Size: px
Start display at page:

Download "NetflixOSS A Cloud Native Architecture"

Transcription

1 NetflixOSS A Cloud Native Architecture LASER Session 5 Availability September

2 Failure Modes and Effects Failure Mode Probability Current Mitigation Plan Application Failure High Automatic degraded response AWS Region Failure Low Active-Active multi-region deployment AWS Zone Failure Medium Continue to run on 2 out of 3 zones Datacenter Failure Medium Migrate more functions to cloud Data store failure Low Restore from S3 backups S3 failure Low Restore from remote archive Until we got really good at mitigating high and medium probability failures, the ROI for mitigating regional failures didn t make sense. Getting there

3 Application Resilience Run what you wrote Rapid detection Rapid Response

4 Chaos Monkey Computers (Datacenter or AWS) randomly die Fact of life, but too infrequent to test resiliency Test to make sure systems are resilient Kill individual instances without customer impact Latency Monkey (coming soon) Inject extra latency and error return codes

5 Edda Configuration History AWS Instances, ASGs, etc. Eureka Services metadata AppDynamics Request flow Edda Monkeys

6 Edda Query Examples Find any instances that have ever had a specific public IP address $ curl " ["i ","i a","i b ] Show the most recent change to a security group $ curl " --- /api/v2/aws.securitygroups/sg ;_pp;_at= ,33 { "ipranges" : [ " /32", " /32", + " /32", - " /32" }

7 Apache Scalable and Stable in large deployments No additional license cost for large scale! Optimized for OLTP vs. Hbase optimized for DSS Available during Partition (AP from CAP) Hinted handoff repairs most transient issues Read-repair and periodic repair keep it clean Quorum and Client Generated Timestamp Read after write consistency with 2 of 3 copies Latest version includes Paxos for stronger transactions

8 Astyanax Client for Java Available at Features Abstraction of connection pool from RPC protocol Fluent Style API Operation retry with backoff Token aware Batch manager Many useful recipes New: Entity Mapper based on JPA annotations

9 Astyanax Query Example Paginate through all columns in a row ColumnList<String> columns; int pageize = 10; try { RowQuery<String, String> query = keyspace.preparequery(cf_standard1).getkey("a").setispaginating().withcolumnrange(new RangeBuilder().setMaxSize(pageize).build()); while (!(columns = query.execute().getresult()).isempty()) { for (Column<String> c : columns) { } } } catch (ConnectionException e) { }

10 C* Astyanax Recipes Distributed row lock (without needing zookeeper) Multi-region row lock Uniqueness constraint Multi-row uniqueness constraint Chunked and multi-threaded large file storage Reverse index search All rows query Durable message queue Contributed: High cardinality reverse index

11 Astyanax Futures Maintain backwards compatibility Wrapper for C* 1.2 Netty driver More CQL support NetflixOSS Cloud Prize Ideas DynamoDB Backend? More recipes?

12 Astyanax - Write Data Flows Single Region, Multiple Availability Zone, Token Aware Disks Zone A 1. Client Writes to local coordinator 2. Coodinator writes to other zones 3. Nodes return ack 4. Data written to internal commit log disks (no more than 10 seconds later) 4 Disks Zone C Disks Zone B 3 1 Token Aware Clients Disks Zone A Disks Zone B 2 Disks Zone C If a node goes offline, hinted handoff completes the write when the node comes back up. Requests can choose to wait for one node, a quorum, or all nodes to ack the write SSTable disk writes and compactions occur asynchronously

13 Data Flows for Multi-Region Writes Token Aware, Consistency Level = Local Quorum 1. Client writes to local replicas 2. Local write acks returned to Client which continues when 2 of 3 local nodes are committed 3. Local coordinator writes to remote coordinator. 4. When data arrives, remote coordinator node acks and copies to other remote zones 5. Remote nodes ack to local coordinator 6. Data flushed to internal commit log disks (no more than 10 seconds later) 6 Disks Zone C Disks Zone B 2 If a node or region goes offline, hinted handoff completes the write when the node comes back up. Nightly global compare and repair jobs ensure everything stays consistent. Disks Zone A 1 US Clients Disks Zone A 2 6 Disks Zone B 2 6 Disks Zone C 100+ms latency 3 5 Disks Zone C Disks Zone B Disks Zone A 4 6 Disks 4 6 Zone B 4 EU Clients 6 Disks Zone A 5 Disks Zone C

14 Platform Outage Taxonomy Classify and name the different types of things that can go wrong

15 YOLO

16 Zone Failure Modes Power Outage Instances lost, ephemeral state lost Clean break and recovery, fail fast, no route to host Network Outage Instances isolated, state inconsistent More complex symptoms, recovery issues, transients Dependent Service Outage Cascading failures, misbehaving instances, human errors Confusing symptoms, recovery issues, byzantine effects

17 Zone Power Failure June 29, 2012 AWS US-East - The Big Storm Highlights One of 10+ US-East datacenters failed generator startup UPS depleted -> 10min power outage for 7% of instances Result Netflix lost power to most of a zone, evacuated the zone Small/brief user impact due to errors and retries

18 Zone Failure Modes Zone Network Outage US-East Load Balancers EU-West Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Zone Power Outage Zone Dependent Service Outage

19 Regional Failure Modes Network Failure Takes Region Offline DNS configuration errors Bugs and configuration errors in routers Network capacity overload Control Plane Overload Affecting Entire Region Consequence of other outages Lose control of remaining zones infrastructure Cascading service failure, hard to diagnose

20 Regional Control Plane Overload April 2011 The big EBS Outage Human error during network upgrade triggered cascading failure Zone level failure, with brief regional control plane overload Netflix Infrastructure Impact Instances in one zone hung and could not launch replacements Overload prevented other zones from launching instances Some MySQL slaves offline for a few days Netflix Customer Visible Impact Higher latencies for a short time Higher error rates for a short time Outage was at a low traffic level time, so no capacity issues

21 Dependent Services Failure June 29, 2012 AWS US-East - The Big Storm Power failure recovery overloaded EBS storage service Backlog of instance startups using EBS root volumes ELB (Load Balancer) Impacted ELB instances couldn t scale because EBS was backlogged ELB control plane also became backlogged Mitigation Plans Mentioned Multiple control plane request queues to isolate backlog Rapid DNS based traffic shifting between zones

22 Regional Failure Modes Regional Network Outage US-East Load Balancers EU-West Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Control Plane Overload

23 Application Routing Failure June 29, 2012 AWS US-East - The Big Storm Eureka service directory failed to mark down dead instances due to a configuration error US-East Load Balancers EU-West Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Zone Power Outage Applications not using Zone-aware routing kept trying to talk to dead instances and timing out Effect: higher latency and errors Mitigation: Fixed config, and made zone aware routing the default

24 Partial Regional ELB Outage Dec 24 th 2012 US-East Load Balancers EU-West Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C ELB (Load Balancer) Impacted ELB control plane database state accidentally corrupted Hours to detect, hours to restore from backups Mitigation Plans Mentioned Tighter process for access to control plane Better zone isolation

25 Global Failure Modes Software Bugs Externally triggered (e.g. leap year/leap second) Memory leaks and other delayed action failures Global configuration errors Usually human error Both infrastructure and application level Cascading capacity overload Customers migrating away from a failure Lack of cross region service isolation

26 Global Software Bug Outages AWS S3 Global Outage in 2008 Gossip protocol propagated errors worldwide No data loss, but service offline for up to 9hrs Extra error detection fixes, no big issues since Microsoft Azure Leap Day Outage in 2012 Bug failed to generate certificates ending 2/29/13 Failure to launch new instances for up to 13hrs One line code fix. Netflix Configuration Error in 2012 Global property updated to broken value Streaming stopped worldwide for ~1hr until we changed back Fix planned to keep history of properties for quick rollback

27 Global Failure Modes Cascading Capacity Overload US-East Load Balancers EU-West Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Capacity Demand Migrates Software Bugs and Global Configuration Errors Oops

28 Slideshare.net/Netflix Details Meetup S1E3 July Featuring Contributors Eucalyptus, IBM, Paypal, Riot Games Lightning Talks March S1E2 Lightning Talks Feb S1E1 Asgard In Depth Feb S1E1 Security Architecture Cost Aware Cloud Architectures with Jinesh Varia of AWS

29 Takeaways Cloud Native Manages Scale and Complexity at Speed NetflixOSS makes it easier for everyone to become Cloud Native

Lessons Learned from the Movies

Lessons Learned from the Movies Lessons Learned from the Movies October 2013 Adrian Cockcroft @adrianco @NetflixOSS http://www.linkedin.com/in/adriancockcroft Where time to market wins big Making a land-grab Disrupting competitors (OODA)

More information

Design For Availability. October 2013 Stevan Vlaovic svlaovic@netflix.com http://www.linkedin.com/in/stevanvlaovic

Design For Availability. October 2013 Stevan Vlaovic svlaovic@netflix.com http://www.linkedin.com/in/stevanvlaovic Design For Availability October 2013 Stevan Vlaovic svlaovic@netflix.com http://www.linkedin.com/in/stevanvlaovic Stevan Vlaovic Director, Membership Infrastructure, Netflix Performance Architect, Display

More information

Netflix and Open Source. April 2013 Adrian Cockcroft @adrianco #netflixcloud @NetflixOSS http://www.linkedin.com/in/adriancockcroft

Netflix and Open Source. April 2013 Adrian Cockcroft @adrianco #netflixcloud @NetflixOSS http://www.linkedin.com/in/adriancockcroft Netflix and Open Source April 2013 Adrian Cockcroft @adrianco #netflixcloud @NetflixOSS http://www.linkedin.com/in/adriancockcroft Cloud Native NetflixOSS Cloud Native On-Ramp Netflix Open Source Cloud

More information

Designing Apps for Amazon Web Services

Designing Apps for Amazon Web Services Designing Apps for Amazon Web Services Mathias Meyer, GOTO Aarhus 2011 Montag, 10. Oktober 11 Montag, 10. Oktober 11 Me infrastructure code databases @roidrage www.paperplanes.de Montag, 10. Oktober 11

More information

High-Availability in the Cloud Architectural Best Practices

High-Availability in the Cloud Architectural Best Practices 1 High-Availability in the Cloud Architectural Best Practices Josh Fraser, VP Business Development, RightScale Brian Adler, Sr. Professional Services Architect 2 # RightScale World s #1 cloud management

More information

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2 DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing Slide 1 Slide 3 A style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet.

More information

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF Non-Stop for Apache HBase: -active region server clusters TECHNICAL BRIEF Technical Brief: -active region server clusters -active region server clusters HBase is a non-relational database that provides

More information

Distributed Systems. Tutorial 12 Cassandra

Distributed Systems. Tutorial 12 Cassandra Distributed Systems Tutorial 12 Cassandra written by Alex Libov Based on FOSDEM 2010 presentation winter semester, 2013-2014 Cassandra In Greek mythology, Cassandra had the power of prophecy and the curse

More information

Practical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00

Practical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00 Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn

More information

Multi-Datacenter Replication

Multi-Datacenter Replication www.basho.com Multi-Datacenter Replication A Technical Overview & Use Cases Table of Contents Table of Contents... 1 Introduction... 1 How It Works... 1 Default Mode...1 Advanced Mode...2 Architectural

More information

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity P3 InfoTech Solutions Pvt. Ltd http://www.p3infotech.in July 2013 Created by P3 InfoTech Solutions Pvt. Ltd., http://p3infotech.in 1 Web Application Deployment in the Cloud Using Amazon Web Services From

More information

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Facebook: Cassandra. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation Facebook: Cassandra Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/24 Outline 1 2 3 Smruti R. Sarangi Leader Election

More information

Introduction to Cassandra

Introduction to Cassandra Introduction to Cassandra DuyHai DOAN, Technical Advocate Agenda! Architecture cluster replication Data model last write win (LWW), CQL basics (CRUD, DDL, collections, clustering column) lightweight transactions

More information

Architecting For Failure Why Cloud Architecture is Different! Michael Stiefel www.reliablesoftware.com development@reliablesoftware.

Architecting For Failure Why Cloud Architecture is Different! Michael Stiefel www.reliablesoftware.com development@reliablesoftware. Architecting For Failure Why Cloud Architecture is Different! Michael Stiefel www.reliablesoftware.com development@reliablesoftware.com Outsource Infrastructure? Traditional Web Application Web Site Virtual

More information

Netflix: Building Up and Scaling Out on Open Source

Netflix: Building Up and Scaling Out on Open Source Netflix: Building Up and Scaling Out on Open Source Black Duck 2013 Presenters Adrian Cockcroft is the director of architecture for the Cloud Systems team at Netflix. He is focused on availability, resilience,

More information

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform Page 1 of 16 Table of Contents Table of Contents... 2 Introduction... 3 NoSQL Databases... 3 CumuLogic NoSQL Database Service...

More information

Amazon EC2 Product Details Page 1 of 5

Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Functionality Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of

More information

Exchange Data Protection: To the DAG and Beyond. Whitepaper by Brien Posey

Exchange Data Protection: To the DAG and Beyond. Whitepaper by Brien Posey Exchange Data Protection: To the DAG and Beyond Whitepaper by Brien Posey Exchange is Mission Critical Ask a network administrator to name their most mission critical applications and Exchange Server is

More information

GoGrid Implement.com Configuring a SQL Server 2012 AlwaysOn Cluster

GoGrid Implement.com Configuring a SQL Server 2012 AlwaysOn Cluster GoGrid Implement.com Configuring a SQL Server 2012 AlwaysOn Cluster Overview This documents the SQL Server 2012 Disaster Recovery design and deployment, calling out best practices and concerns from the

More information

Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014

Highly available, scalable and secure data with Cassandra and DataStax Enterprise. GOTO Berlin 27 th February 2014 Highly available, scalable and secure data with Cassandra and DataStax Enterprise GOTO Berlin 27 th February 2014 About Us Steve van den Berg Johnny Miller Solutions Architect Regional Director Western

More information

LARGE-SCALE DATA STORAGE APPLICATIONS

LARGE-SCALE DATA STORAGE APPLICATIONS BENCHMARKING AVAILABILITY AND FAILOVER PERFORMANCE OF LARGE-SCALE DATA STORAGE APPLICATIONS Wei Sun and Alexander Pokluda December 2, 2013 Outline Goal and Motivation Overview of Cassandra and Voldemort

More information

AMAZON S3: ARCHITECTING FOR RESILIENCY IN THE FACE OF FAILURES Jason McHugh

AMAZON S3: ARCHITECTING FOR RESILIENCY IN THE FACE OF FAILURES Jason McHugh AMAZON S3: ARCHITECTING FOR RESILIENCY IN THE FACE OF FAILURES Jason McHugh CAN YOUR S ERVICE S URVIVE? CAN YOUR S ERVICE S URVIVE? CAN YOUR SERVICE SURVIVE? Datacenter loss of connectivity Flood Tornado

More information

Migrating to Microservices. Adrian Cockcroft @adrianco QCon London 6 th March 2014

Migrating to Microservices. Adrian Cockcroft @adrianco QCon London 6 th March 2014 Migrating to Microservices Adrian Cockcroft @adrianco QCon London 6 th March 2014 What I learned from my time at Netflix Speed wins in the marketplace Remove friction from product development High trust,

More information

Tushar Joshi Turtle Networks Ltd

Tushar Joshi Turtle Networks Ltd MySQL Database for High Availability Web Applications Tushar Joshi Turtle Networks Ltd www.turtle.net Overview What is High Availability? Web/Network Architecture Applications MySQL Replication MySQL Clustering

More information

Design for Failure High Availability Architectures using AWS

Design for Failure High Availability Architectures using AWS Design for Failure High Availability Architectures using AWS Harish Ganesan Co founder & CTO 8KMiles www.twitter.com/harish11g http://www.linkedin.com/in/harishganesan Sample Use Case Multi tiered LAMP/LAMJ

More information

Distributed Storage Systems part 2. Marko Vukolić Distributed Systems and Cloud Computing

Distributed Storage Systems part 2. Marko Vukolić Distributed Systems and Cloud Computing Distributed Storage Systems part 2 Marko Vukolić Distributed Systems and Cloud Computing Distributed storage systems Part I CAP Theorem Amazon Dynamo Part II Cassandra 2 Cassandra in a nutshell Distributed

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

Transactions and ACID in MongoDB

Transactions and ACID in MongoDB Transactions and ACID in MongoDB Kevin Swingler Contents Recap of ACID transactions in RDBMSs Transactions and ACID in MongoDB 1 Concurrency Databases are almost always accessed by multiple users concurrently

More information

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 F1: A Distributed SQL Database That Scales Presentation by: Alex Degtiar (adegtiar@cmu.edu) 15-799 10/21/2013 What is F1? Distributed relational database Built to replace sharded MySQL back-end of AdWords

More information

Fault-Tolerant Computer System Design ECE 695/CS 590. Putting it All Together

Fault-Tolerant Computer System Design ECE 695/CS 590. Putting it All Together Fault-Tolerant Computer System Design ECE 695/CS 590 Putting it All Together Saurabh Bagchi ECE/CS Purdue University ECE 695/CS 590 1 Outline Looking at some practical systems that integrate multiple techniques

More information

Introduction to Apache Cassandra

Introduction to Apache Cassandra Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating

More information

Apache HBase. Crazy dances on the elephant back

Apache HBase. Crazy dances on the elephant back Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage

More information

MySQL: Cloud vs Bare Metal, Performance and Reliability

MySQL: Cloud vs Bare Metal, Performance and Reliability MySQL: Cloud vs Bare Metal, Performance and Reliability Los Angeles MySQL Meetup Vladimir Fedorkov, March 31, 2014 Let s meet each other Performance geek All kinds MySQL and some Sphinx Working for Blackbird

More information

GlobalSCAPE DMZ Gateway, v1. User Guide

GlobalSCAPE DMZ Gateway, v1. User Guide GlobalSCAPE DMZ Gateway, v1 User Guide GlobalSCAPE, Inc. (GSB) Address: 4500 Lockhill-Selma Road, Suite 150 San Antonio, TX (USA) 78249 Sales: (210) 308-8267 Sales (Toll Free): (800) 290-5054 Technical

More information

Migration and Disaster Recovery Underground in the NEC / Iron Mountain National Data Center with the RackWare Management Module

Migration and Disaster Recovery Underground in the NEC / Iron Mountain National Data Center with the RackWare Management Module Migration and Disaster Recovery Underground in the NEC / Iron Mountain National Data Center with the RackWare Management Module WHITE PAPER May 2015 Contents Advantages of NEC / Iron Mountain National

More information

HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg HDB++: HIGH AVAILABILITY WITH Page 1 OVERVIEW What is Cassandra (C*)? Who is using C*? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ Page 2 OVERVIEW What is Cassandra (C*)?

More information

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014 Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ Cloudera World Japan November 2014 WANdisco Background WANdisco: Wide Area Network Distributed Computing Enterprise ready, high availability

More information

Real-time Data Replication

Real-time Data Replication Real-time Data Replication from Oracle to other databases using DataCurrents WHITEPAPER Contents Data Replication Concepts... 2 Real time Data Replication... 3 Heterogeneous Data Replication... 4 Different

More information

High Performance MySQL Choices in Amazon Web Services: Beyond RDS. Andrew Shieh, SmugMug Operations shandrew @ smugmug.

High Performance MySQL Choices in Amazon Web Services: Beyond RDS. Andrew Shieh, SmugMug Operations shandrew @ smugmug. High Performance MySQL Choices in Amazon Web Services: Beyond RDS Andrew Shieh, SmugMug Operations shandrew @ smugmug.com April 15, 2015 Agenda 2 All about AWS Current RDS alternatives Cloud failures ->

More information

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module June, 2015 WHITE PAPER Contents Advantages of IBM SoftLayer and RackWare Together... 4 Relationship between

More information

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module June, 2015 WHITE PAPER Contents Advantages of IBM SoftLayer and RackWare Together... 4 Relationship between

More information

Conventionally, software testing has aimed at verifying functionality but the testing paradigm has changed for software services.

Conventionally, software testing has aimed at verifying functionality but the testing paradigm has changed for software services. 1 Conventionally, software testing has aimed at verifying functionality but the testing paradigm has changed for software services. Developing a full-featured and functioning software service is necessary;

More information

Cloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise

Cloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise Cloud Service Model Selecting a cloud service model Different cloud service models within the enterprise Single cloud provider AWS for IaaS Azure for PaaS Force fit all solutions into the cloud service

More information

NetflixOSS A Cloud Native Architecture

NetflixOSS A Cloud Native Architecture NetflixOSS A Cloud Native Architecture LASER Sessions 2&3 Overview September 2013 Adrian Cockcroft @adrianco @NetflixOSS http://www.linkedin.com/in/adriancockcroft Presentation vs. Tutorial Presentation

More information

Velocity and Volume (or Speed Wins)

Velocity and Volume (or Speed Wins) Velocity and Volume (or Speed Wins) Flowcon November 2013 Adrian CockcroB @adrianco @NeDlixOSS hhp://www.linkedin.com/in/adriancockcrob "This is the IT swamp draining manual for anyone who is neck deep

More information

SCALABILITY AND AVAILABILITY

SCALABILITY AND AVAILABILITY SCALABILITY AND AVAILABILITY Real Systems must be Scalable fast enough to handle the expected load and grow easily when the load grows Available available enough of the time Scalable Scale-up increase

More information

Companies are moving more and more IT services and

Companies are moving more and more IT services and Adding High Availability to the Cloud Paul J. Holenstein Executive Vice President Gravic, Inc. Companies are moving more and more IT services and utility applications to public clouds to take advantage

More information

Architecting Distributed Databases for Failure A Case Study with Druid

Architecting Distributed Databases for Failure A Case Study with Druid Architecting Distributed Databases for Failure A Case Study with Druid Fangjin Yang Cofounder @ Imply The Bad The Really Bad Overview The Catastrophic Best Practices: Operations Everything is going to

More information

Availability Digest. MySQL Clusters Go Active/Active. December 2006

Availability Digest. MySQL Clusters Go Active/Active. December 2006 the Availability Digest MySQL Clusters Go Active/Active December 2006 Introduction MySQL (www.mysql.com) is without a doubt the most popular open source database in use today. Developed by MySQL AB of

More information

ArcGIS 10.3 Server on Amazon Web Services

ArcGIS 10.3 Server on Amazon Web Services ArcGIS 10.3 Server on Amazon Web Services Copyright 1995-2015 Esri. All rights reserved. Table of Contents Introduction What is ArcGIS Server on Amazon Web Services?............................... 5 Quick

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

The Hadoop Distributed File System

The Hadoop Distributed File System The Hadoop Distributed File System Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com Presenter: Alex Hu HDFS

More information

CONNECTRIA MANAGED AMAZON WEB SERVICES (AWS)

CONNECTRIA MANAGED AMAZON WEB SERVICES (AWS) CONNECTRIA MANAGED AMAZON WEB SERVICES (AWS) Maximize the benefits of using AWS. With Connectria s Managed AWS, you can purchase and implement 100% secure, highly available, managed AWS solutions all backed

More information

High Availability Solutions for the MariaDB and MySQL Database

High Availability Solutions for the MariaDB and MySQL Database High Availability Solutions for the MariaDB and MySQL Database 1 Introduction This paper introduces recommendations and some of the solutions used to create an availability or high availability environment

More information

SQL Databases Course. by Applied Technology Research Center. This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases.

SQL Databases Course. by Applied Technology Research Center. This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases. SQL Databases Course by Applied Technology Research Center. 23 September 2015 This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases. Oracle Topics This Oracle Database: SQL

More information

Migrating a running service to AWS

Migrating a running service to AWS Migrating a running service to AWS Nick Veenhof Ricardo Amaro DevOps Track https://events.drupal.org/barcelona2015/sessions/migrating-runningservice-mollom-aws-without-service-interruptions-and-reduce

More information

YouTube Vitess. Cloud-Native MySQL. Oracle OpenWorld Conference October 26, 2015. Anthony Yeh, Software Engineer, YouTube. http://vitess.

YouTube Vitess. Cloud-Native MySQL. Oracle OpenWorld Conference October 26, 2015. Anthony Yeh, Software Engineer, YouTube. http://vitess. YouTube Vitess Cloud-Native MySQL Oracle OpenWorld Conference October 26, 2015 Anthony Yeh, Software Engineer, YouTube http://vitess.io/ Spoiler Alert Spoilers 1. History of Vitess 2. What is Cloud-Native

More information

Connectivity. Alliance Access 7.0. Database Recovery. Information Paper

Connectivity. Alliance Access 7.0. Database Recovery. Information Paper Connectivity Alliance Access 7.0 Database Recovery Information Paper Table of Contents Preface... 3 1 Overview... 4 2 Resiliency Concepts... 6 2.1 Database Loss Business Impact... 6 2.2 Database Recovery

More information

DISASTER RECOVERY WITH AWS

DISASTER RECOVERY WITH AWS DISASTER RECOVERY WITH AWS Every company is vulnerable to a range of outages and disasters. From a common computer virus or network outage to a fire or flood these interruptions can wreak havoc on your

More information

A programming model in Cloud: MapReduce

A programming model in Cloud: MapReduce A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

Database Resilience at ISPs. High-Availability. White Paper

Database Resilience at ISPs. High-Availability. White Paper Database Resilience at ISPs High-Availability White Paper Internet Service Providers (ISPs) generally do their job very well. The commercial hosting market is segmented in a number of different ways but

More information

Cloud Computing with Microsoft Azure

Cloud Computing with Microsoft Azure Cloud Computing with Microsoft Azure Michael Stiefel www.reliablesoftware.com development@reliablesoftware.com http://www.reliablesoftware.com/dasblog/default.aspx Azure's Three Flavors Azure Operating

More information

High Availability and Disaster Recovery for Exchange Servers Through a Mailbox Replication Approach

High Availability and Disaster Recovery for Exchange Servers Through a Mailbox Replication Approach High Availability and Disaster Recovery for Exchange Servers Through a Mailbox Replication Approach Introduction Email is becoming ubiquitous and has become the standard tool for communication in many

More information

HDFS Users Guide. Table of contents

HDFS Users Guide. Table of contents Table of contents 1 Purpose...2 2 Overview...2 3 Prerequisites...3 4 Web Interface...3 5 Shell Commands... 3 5.1 DFSAdmin Command...4 6 Secondary NameNode...4 7 Checkpoint Node...5 8 Backup Node...6 9

More information

ArcGIS for Server in the Amazon Cloud. Michele Lundeen Esri

ArcGIS for Server in the Amazon Cloud. Michele Lundeen Esri ArcGIS for Server in the Amazon Cloud Michele Lundeen Esri What we will cover ArcGIS for Server in the Amazon Cloud Why How Extras Why do you need ArcGIS Server? Some examples Publish - Dynamic Map Services

More information

Introduction to Windows Azure Cloud Computing Futures Group, Microsoft Research Roger Barga, Jared Jackson,Nelson Araujo, Dennis Gannon, Wei Lu, and

Introduction to Windows Azure Cloud Computing Futures Group, Microsoft Research Roger Barga, Jared Jackson,Nelson Araujo, Dennis Gannon, Wei Lu, and Introduction to Windows Azure Cloud Computing Futures Group, Microsoft Research Roger Barga, Jared Jackson,Nelson Araujo, Dennis Gannon, Wei Lu, and Jaliya Ekanayake Range in size from edge facilities

More information

Assignment # 1 (Cloud Computing Security)

Assignment # 1 (Cloud Computing Security) Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual

More information

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming by Dibyendu Bhattacharya Pearson : What We Do? We are building a scalable, reliable cloud-based learning platform providing services

More information

Module 14: Scalability and High Availability

Module 14: Scalability and High Availability Module 14: Scalability and High Availability Overview Key high availability features available in Oracle and SQL Server Key scalability features available in Oracle and SQL Server High Availability High

More information

Testing Cloud Application System Resiliency by Wrecking the System

Testing Cloud Application System Resiliency by Wrecking the System Volume 3, No.5, May 2014 International Journal of Advances in Computer Science and Technology Tanvi Dharmarha, International Journal of Advances in Computer Science and Technology, 3(5), May 2014, 357-363

More information

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &

More information

Building Fault-Tolerant Applications on AWS October 2011

Building Fault-Tolerant Applications on AWS October 2011 Building Fault-Tolerant Applications on AWS October 2011 Jeff Barr, Attila Narin, and Jinesh Varia 1 Contents Introduction... 3 Failures Shouldn t be THAT Interesting... 3 Amazon Machine Images... 4 Elastic

More information

This talk is mostly about Data Center Replication, but along the way we'll have to talk about why you'd want transactionality arnd the Low-Level API.

This talk is mostly about Data Center Replication, but along the way we'll have to talk about why you'd want transactionality arnd the Low-Level API. This talk is mostly about Data Center Replication, but along the way we'll have to talk about why you'd want transactionality arnd the Low-Level API. Roughly speaking, the yellow boxes here represenet

More information

Drupal in the Cloud. by Azhan Founder/Director S & A Solutions

Drupal in the Cloud. by Azhan Founder/Director S & A Solutions by Azhan Founder/Director S & A Solutions > Drupal and S & A Solutions S & A Solutions who? doing it with Drupal since 2007 Over 70 projects in 5 years More than 20 clients 99% Drupal projects We love

More information

Distributed Scheduling with Apache Mesos in the Cloud. PhillyETE - April, 2015 Diptanu Gon Choudhury @diptanu

Distributed Scheduling with Apache Mesos in the Cloud. PhillyETE - April, 2015 Diptanu Gon Choudhury @diptanu Distributed Scheduling with Apache Mesos in the Cloud PhillyETE - April, 2015 Diptanu Gon Choudhury @diptanu Who am I? Distributed Systems/Infrastructure Engineer in the Platform Engineering Group Design

More information

Distributed storage for structured data

Distributed storage for structured data Distributed storage for structured data Dennis Kafura CS5204 Operating Systems 1 Overview Goals scalability petabytes of data thousands of machines applicability to Google applications Google Analytics

More information

Feature Comparison. Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V

Feature Comparison. Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V Comparison and Contents Introduction... 4 More Secure Multitenancy... 5 Flexible Infrastructure... 9 Scale, Performance, and Density... 13 High Availability... 18 Processor and Memory Support... 24 Network...

More information

Be Very Afraid. Christophe Pettus PostgreSQL Experts Logical Decoding & Backup Conference Europe 2014

Be Very Afraid. Christophe Pettus PostgreSQL Experts Logical Decoding & Backup Conference Europe 2014 Be Very Afraid Christophe Pettus PostgreSQL Experts Logical Decoding & Backup Conference Europe 2014 You possess only whatever will not be lost in a shipwreck. Al-Ghazali Hi. Christophe Pettus Consultant

More information

Guideline for stresstest Page 1 of 6. Stress test

Guideline for stresstest Page 1 of 6. Stress test Guideline for stresstest Page 1 of 6 Stress test Objective: Show unacceptable problems with high parallel load. Crash, wrong processing, slow processing. Test Procedure: Run test cases with maximum number

More information

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

Scalability of web applications. CSCI 470: Web Science Keith Vertanen Scalability of web applications CSCI 470: Web Science Keith Vertanen Scalability questions Overview What's important in order to build scalable web sites? High availability vs. load balancing Approaches

More information

TABLE OF CONTENTS THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY FOR SHAREPOINT DATA. Introduction. Examining Third-Party Replication Models

TABLE OF CONTENTS THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY FOR SHAREPOINT DATA. Introduction. Examining Third-Party Replication Models 1 THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY TABLE OF CONTENTS 3 Introduction 14 Examining Third-Party Replication Models 4 Understanding Sharepoint High Availability Challenges With Sharepoint

More information

Cloud Computing Is In Your Future

Cloud Computing Is In Your Future Cloud Computing Is In Your Future Michael Stiefel www.reliablesoftware.com development@reliablesoftware.com http://www.reliablesoftware.com/dasblog/default.aspx Cloud Computing is Utility Computing Illusion

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.

More information

TECHNOLOGY WHITE PAPER Jan 2016

TECHNOLOGY WHITE PAPER Jan 2016 TECHNOLOGY WHITE PAPER Jan 2016 Technology Stack C# PHP Amazon Web Services (AWS) Route 53 Elastic Load Balancing (ELB) Elastic Compute Cloud (EC2) Amazon RDS Amazon S3 Elasticache CloudWatch Paypal Overview

More information

CSE-E5430 Scalable Cloud Computing Lecture 11

CSE-E5430 Scalable Cloud Computing Lecture 11 CSE-E5430 Scalable Cloud Computing Lecture 11 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 30.11-2015 1/24 Distributed Coordination Systems Consensus

More information

Best practices for operational excellence (SharePoint Server 2010)

Best practices for operational excellence (SharePoint Server 2010) Best practices for operational excellence (SharePoint Server 2010) Published: May 12, 2011 Microsoft SharePoint Server 2010 is used for a broad set of applications and solutions, either stand-alone or

More information

When talking about hosting

When talking about hosting d o s Cloud Hosting - Amazon Web Services Thomas Floracks When talking about hosting for web applications most companies think about renting servers or buying their own servers. The servers and the network

More information

SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation

SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation SQL Server 2014 New Features/In- Memory Store Juergen Thomas Microsoft Corporation AGENDA 1. SQL Server 2014 what and when 2. SQL Server 2014 In-Memory 3. SQL Server 2014 in IaaS scenarios 2 SQL Server

More information

In Memory Accelerator for MongoDB

In Memory Accelerator for MongoDB In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000

More information

Release Notes LS Retail Data Director 3.01.04 August 2011

Release Notes LS Retail Data Director 3.01.04 August 2011 Release Notes LS Retail Data Director 3.01.04 August 2011 Copyright 2010-2011, LS Retail. All rights reserved. All trademarks belong to their respective holders. Contents 1 Introduction... 1 1.1 What s

More information

HDFS Under the Hood. Sanjay Radia. Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc.

HDFS Under the Hood. Sanjay Radia. Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc. HDFS Under the Hood Sanjay Radia Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc. 1 Outline Overview of Hadoop, an open source project Design of HDFS On going work 2 Hadoop Hadoop provides a framework

More information

Simba Apache Cassandra ODBC Driver

Simba Apache Cassandra ODBC Driver Simba Apache Cassandra ODBC Driver with SQL Connector 2.2.0 Released 2015-11-13 These release notes provide details of enhancements, features, and known issues in Simba Apache Cassandra ODBC Driver with

More information

המרכז ללימודי חוץ המכללה האקדמית ספיר. ד.נ חוף אשקלון 79165 טל'- 08-6801535 פקס- 08-6801543 בשיתוף עם מכללת הנגב ע"ש ספיר

המרכז ללימודי חוץ המכללה האקדמית ספיר. ד.נ חוף אשקלון 79165 טל'- 08-6801535 פקס- 08-6801543 בשיתוף עם מכללת הנגב עש ספיר מודולות הלימוד של מייקרוסופט הקורס מחולק ל 4 מודולות כמפורט:.1Configuring Microsoft Windows Vista Client 70-620 Installing and upgrading Windows Vista Identify hardware requirements. Perform a clean installation.

More information

19.10.11. Amazon Elastic Beanstalk

19.10.11. Amazon Elastic Beanstalk 19.10.11 Amazon Elastic Beanstalk A Short History of AWS Amazon started as an ECommerce startup Original architecture was restructured to be more scalable and easier to maintain Competitive pressure for

More information

Connectivity. Alliance Access 7.0. Database Recovery. Information Paper

Connectivity. Alliance Access 7.0. Database Recovery. Information Paper Connectivity Alliance 7.0 Recovery Information Paper Table of Contents Preface... 3 1 Overview... 4 2 Resiliency Concepts... 6 2.1 Loss Business Impact... 6 2.2 Recovery Tools... 8 3 Manual Recovery Method...

More information