High-Availability, Fault Tolerance, and Resource Oriented Computing

Similar documents

Son of SOA Resource-Oriented Computing Event-Driven Architecture

Mission-Critical Enterprise/Cloud Hybrid Applications

MuleSoft Blueprint: Load Balancing Mule for Scalability and Availability

In Memory Accelerator for MongoDB

LinuxWorld Conference & Expo Server Farms and XML Web Services

The Art of Picking Your Poison

BUILDING HIGH-AVAILABILITY SERVICES IN JAVA

Configuration Management of Massively Scalable Systems

Chapter 1 - Web Server Management and Cluster Topology

Mule Enterprise Service Bus (ESB) Hosting

3 Case Studies of NoSQL and Java Apps in the Real World

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

Chapter 2 TOPOLOGY SELECTION. SYS-ED/ Computer Education Techniques, Inc.

Best Practices for Implementing High Availability for SAS 9.4

Cloud computing - Architecting in the cloud

Introduction 1 Performance on Hosted Server 1. Benchmarks 2. System Requirements 7 Load Balancing 7

COMPARISON OF VMware VSHPERE HA/FT vs stratus

Big Data Analytics - Accelerated. stream-horizon.com

OBIEE 11g Scaleout & Clustering

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April Page 1 of 12

Siemens PLM Connection. Mark Ludwig

Enterprise Application Mashup with Mule ESB

Red Hat Enterprise linux 5 Continuous Availability

Enabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings

Availability Digest. Redundant Load Balancing for High Availability July 2013

Reference Model for Cloud Applications CONSIDERATIONS FOR SW VENDORS BUILDING A SAAS SOLUTION

Migrating Applications From IBM WebSphere to Apache Tomcat

High Performance Cluster Support for NLB on Window

VMware vcloud Automation Center 6.1

AquaLogic Service Bus

WSO2 Message Broker. Scalable persistent Messaging System

WELCOME TO Open Source Enterprise Architecture

INCREASE SYSTEM AVAILABILITY BY LEVERAGING APACHE TOMCAT CLUSTERING

Apigee Gateway Specifications

Web Application Hosting Cloud Architecture

IBM WebSphere Enterprise Service Bus, Version 6.0.1

VBLOCK SOLUTION FOR SAP APPLICATION HIGH AVAILABILITY

SCALABILITY AND AVAILABILITY

ORACLE DATABASE 10G ENTERPRISE EDITION

VMware vrealize Automation

High Availability: Evaluating Open Source Enterprise Service Buses

Astaro Deployment Guide High Availability Options Clustering and Hot Standby

High Availability of the Polarion Server

<Insert Picture Here> Oracle In-Memory Database Cache Overview

Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies

Cloud Based Application Architectures using Smart Computing

Setting Up B2B Data Exchange for High Availability in an Active/Active Configuration

WebSphere Application Server - Introduction, Monitoring Tools, & Administration

Enterprise Integration

RED HAT ENTERPRISE VIRTUALIZATION

How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm (

ORACLE COHERENCE 12CR2

DESIGN OF A PLATFORM OF VIRTUAL SERVICE CONTAINERS FOR SERVICE ORIENTED CLOUD COMPUTING. Carlos de Alfonso Andrés García Vicente Hernández

Enabling Technologies for Distributed and Cloud Computing

High Availability Technical Notice

Scalable Architecture on Amazon AWS Cloud

Oracle BI Publisher Enterprise Cluster Deployment. An Oracle White Paper August 2007

Tushar Joshi Turtle Networks Ltd

IdP Clustering. You want to prevent service outages. High Availability and Load Balancing. Possible problems: HW failures

bla bla OPEN-XCHANGE Open-Xchange Hardware Needs

How To Use An Org.Org Cloud System For A Business

The best platform for building cloud infrastructures. Ralf von Gunten Sr. Systems Engineer VMware

Lab 5 Explicit Proxy Performance, Load Balancing & Redundancy

ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy

ServerCentral Cloud Services Reliable. Adaptable. Robust.

ACHIEVING 100% UPTIME WITH A CLOUD-BASED CONTACT CENTER

High Availability Storage

INUVIKA TECHNICAL GUIDE

High Availability for Citrix XenApp

Winning the J2EE Performance Game Presented to: JAVA User Group-Minnesota

PolyServe Matrix Server for Linux

RED HAT JBOSS FUSE. An open source enterprise service bus

Active/Active DB2 Clusters for HA and Scalability

HRG Assessment: Stratus everrun Enterprise

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

SOA REFERENCE ARCHITECTURE: WEB TIER

GLOBAL DIGITAL ENTERTAINMENT CONTENT AND SERVICES PROVIDER JESTA DIGITAL MIGRATES TO THE ULTRAESB

Learn Oracle WebLogic Server 12c Administration For Middleware Administrators

Ryan Horn, Lead Software Engineer at Twilio. November 12, 2014 Las Vegas. BDT312 Using the Cloud to Scale from a Database to a Data Platform

CLOUD. MADE EASY. vnebula Portal

ZEN LOAD BALANCER EE v3.02 DATASHEET The Load Balancing made easy

VMware vrealize Automation

Virtualizing Apache Hadoop. June, 2012

GigaSpaces Real-Time Analytics for Big Data

Introducing. Markus Erlacher Technical Solution Professional Microsoft Switzerland

Cloud Computing: Computing as a Service. Prof. Daivashala Deshmukh Maharashtra Institute of Technology, Aurangabad

Centrata IT Management Suite 3.0

Oracle WebLogic Foundation of Oracle Fusion Middleware. Lawrence Manickam Toyork Systems Inc

System Models for Distributed and Cloud Computing

In-Memory BigData. Summer 2012, Technology Overview

Database Scalability and Oracle 12c

High Availability Implementation for JD Edwards EnterpriseOne

3-Tier Architecture. 3-Tier Architecture. Prepared By. Channu Kambalyal. Page 1 of 19

Transcription:

Eugene Ciurana geecon@ciurana.eu - pr3d4t0r ##java, irc.freenode.net High-Availability, Fault Tolerance, and Resource Oriented Computing This presentation is available from: http://ciurana.eu/geecon-2010

About Eugene... 15+ years building mission-critical, highavailability systems 14+ years of Java work Open source evangelist Official adoption of open source/linux at Walmart worldwide State of the art main line of business at the largest companies in the world - not a web guy!

What You ll Learn... Decoupled, event-driven, resource-oriented systems are more flexible Avoid tight, point-to-point integration Enhance JVM-based apps with better domain-specific languages How to move away from monolithic app servers and architectures How to implement event-driven systems based by leveraging existing infrastructure and SOA investment Treat computational resources as addressable entities Balance open source vs. commercial products

Very Important! Please Ask Questions! (don t be shy)

What is Scalability? Scalability is the property of a system to: handle bigger amounts of work; or to be easily expanded in response to increased demand network, processing, database, file resources Types of scalability Horizontal (out): add more nodes with identical functionality as existing ones and redistribute the load Vertical (up): expand by adding more cores, main memory, storage, or network interfaces

Horizontal Scalability Load Balancer Scales out Load Balancer Clustering!

Vertical Scalability Virtual 2 Virtual 3 Virtual 2 Virtual 1 Scales up Virtual 1 Virtual 0 Dual Core Single Processor 16 MB RAM Virtual 0 Dual Core Dual Processor 32 MB RAM

What is Availability? How well a system provides useful resources over a set period of time High availability guarantees an absolute degree of functional continuity within a time window Expressed as a relationship between uptime and unplanned downtime A = 100 - (100*D/U); D, U expressed in minutes Beware: uptime!= available

The Nines Game Availability % Downtime (minutes) Downtime/year Vendor jargon 90 52560.00 36.5 days one nine 99 5256.00 3.7 days two nines 99.9 526.60 8.8 hours three nines 99.99 52.56 53 minutes four nines 99.999 5.26 5.3 minutes five nines 99.9999 0.53 32 seconds six nines

Service Level Agreements SLAs are negotiated terms that outline the obligations of the two parties delivering and using a system System type - not all systems require the same SLA Levels of availability Minimum Target Uptime Network Power Maintenance windows Serviceability Performance and metrics Billing SLAs help determine if you scale up or out

Load Balancers They work by spreading requests among two or more resources Implemented in hardware or in software Multiple machines Multiple processes Multiple threads Resources appear as a single device to consumers Can be stateless (web services), or stateful (applications that require session management) Algorithms determine the distribution 1/n == all systems equally likely to service Special requests (e.g. music store) some servers get hit more than others

Load Balancers Consumer Load Balancer 74.0.125.28 Rn R = request n = sequence number R1 R3 R2 192.168.202.55 192.168.202.66 192.168.202.67 192.168.202.69

Persistent Load Balancers Consumer Consumer Consumer Sticky Load Balancer 74.0.125.28 192.168.202.55 192.168.202.66 192.168.202.67 192.168.202.69

Load Balancing and Databases Consumer Load Balancer 74.0.125.28 192.168.202.55 192.168.202.66 192.168.202.67 192.168.202.69 Session Data

Caching Strategies Stateful load balancing requires data sharing Caching distributes popular, shared read-only data Think of them as a giant hash map If the data isn t in the cache, fetch it from database Write policies: write-through: write to the cache AND database write-behind: cache is marked dirty and updated only if a dirty datum is requested no-write allocation: only read requests are cached; assumes data never changes

Caching Usage Pattern Application caching Little or no programmer participation (e.g. Terracotta) Explicit API calls (memcached, Coherence, etc.) Web caching - stores full documents, or fragments ( particles ) on the server or client and are invisible to the client Web accelerators - distribute the load (e.g. CDN like S3, Akamai, etc.) Proxy caches - distribute requests to same resources and may provide filtering/query (e.g. Squid, Apache, ISA servers)

Caching Usage Pattern Begin query Query? update Fetch datum from cache datum is None no Update datum in database yes Query datum from database Add datum to cache Invalidate cache Add or update datum to cache Use datum in app End

Distributed Caching Consumer Load Balancer 74.0.125.28 192.168.202.55 192.168.202.66 192.168.202.67 192.168.202.69 Load Balanced Configuration or Datagram Cache 0 Cache 1 Cache 2 Cache 3 Database

Clustering Cluster - two or more systems that appear to users as a single system A cluster (horizontally scalable) system is more costeffective than a monolithic single system (vertically scalable) with the same performance characteristics Systems are connected in the cluster over high-speed LANs like Gb Ethernet, FDDI, Infiniband, Myrinet, etc.

A/A Clustering A/A == Active/Active Distribute the load evenly among multiple nodes All nodes offer the same capabilities All nodes are active at the same time Consumer Load Balancer 74.0.125.28 192.168.202.55 192.168.202.66 192.168.202.67 192.168.202.69

High-Availability A/P Cluster A/P == Active/Passive Provides uninterrupted service through redundant nodes Eliminates single-point-of-failure Two nodes minimum, and heartbeat detection Automatic traffic switch for fail-over Consumer Router 74.0.125.28 Active 192.168.202.55 heartbeat Failover 192.168.202.69 State Data Cache Database replication or clustered database Failover Database

Grid Consumer Master Load Balancer Load Balancer Process loads as independent jobs s don t require data sharing Storage, network may be shared by all nodes Intermediate results have no bearing on other jobs progress Each node is independent Map/Reduce (Hadoop)

Computational Cluster Used for operations that require raw computational power Not good for transactional operations (web, database) Tightly coupled nodes, homogeneous, close proximity Meant to replace supercomputers Consumer Master

Redundancy and Fault Tolerance Redundancy - the expectation that any system component failure is independent of failure in other components Fault tolerance - the system continues to operate in the event of component failure May have decreased throughput Fault tolerance results from SLAs

Fault Tolerance SLA Requirements No single point of failure - redundant components ensure continuous operation Allow repairs without disruption of service Fault isolation - problem detection must pinpoint the specific faulty component Fault propagation containment - problems in one component must not cascade to others Reversion mode - the system can be set back to a known state on command

A/A Cluster Fault Tolerance Consumer Load Balancer 74.0.125.28 Replacement 192.168.202.53 192.168.202.55 192.168.202.66 192.168.202.67 192.168.202.69 Uninterruptible, scalable service (stateless, web services) Failure transparency - though maybe degraded service Ideal for event-based web services (SOAP, REST, JMS, etc.) No dependencies between nodes

A/P Cluster Fault Tolerance Consumer Router 74.0.125.28 192.168.202.55 heartbeat Failover 192.168.202.69 State Data Cache Database Failover Database High availability through redundancy and failure detection Higher cost - used for stateful systems May require active sys- or netadmin participation More moving parts - more things to coordinate

Putting It All Together

ROC Architecture ROC = Resource-Oriented Computing Everything is a resource (computational, data, other) Service Provider (UPS, FedEx) Service Object Web browser Internet Remedy Dedicated API business logic Web app JMS, SOAP, etc. GUI App Transformer Transformer Mule ESB Transformer SOAP JDBC HTTP, XML CRM Product Catalogue Product Support Product Pages Support Product Pages Support Pages TCP pass-through Single Sign-On LDAP, SOAP Mainframe / RACF Active Directory Legacy Auth

SOA and Computational Network

Real-Life Example - LeapFrog USB End-User System (Mac, Windows) LeapFrog Connect Web Browser Third-party Partner Site S3 Content Repository Internet www.leapfrog.com connected products LearningPath Firewall Mule ESB backbone HTTP, SOAP (CXF), REST, etc. routing, filtering, and dispatching; ActiveMQ JMS broker; dedicated LeapFrog services Mule ESB tailbone Connected products SOAP, REST web services Mule ESB funnybone Device log upload, processing, servlet container Content Management System REST, JCR Crowd SSO Customer Data Game play Data Servlets App Logic Device Logs Content Authoring User Credentials

Real-Life Example - LeapFrog Internet Load Balancer Application Server Tomcat 6 Services Proxy Application Server Tomcat 6 Load Balancer - Backbone Backbone - message filtering, routing, dispatching, queuing, events Mule ESB 1.6.2 Mule ESB 1.6.2 Mule ESB 1.6.2 Mule ESB 1.6.2 Load Balancer - Tailbone Load Balancer - Funnybone Load Balancer - Message Broker Mule ESB SOAP, REST Mule ESB SOAP, REST Mule ESB servlet, MTOM Mule ESB servlet, MTOM ActiveMQ ActiveMQ Database NFS share NFS share

Mule SOA Applied Clustering * Two or more Mule instances can provide services, for scalability if there is high demand * Load balanced configuration has built-in fail-over * External apps see a single point of entry: the service endpoint name * Load balancer or proxy sends the request to any available Mule server * Increased demand - add another Mule server without interrupting the existing ones * Decreased demand - remove Mule servers without interrupting other servers * This is an active/active configuration - any server can handle a request at any time * Assumes that the service application components are stateless External Applications http://server.mycompany.com/service_call Load Balancer http://mule_server_1/service_call http://mule_server_2/service_call Mule ESB as Application Container 1 Mule ESB as Application Container 2 Service 1 Service 2 Service 3 Service 1 Service 2 Service 3

Mule SOA - ESB App Failover * A/A configuration uses the load balancer to dispatch service calls * The load balancer takes a failing service out of rotation automatically * Failure reason no. 1: network connectivity * Failure reason no. 2: Mule container * Failure reason no. 3: Service application bug External Applications http://server.mycompany.com/service_call Load Balancer http://mule_server_1/service_call http://mule_server_2/service_call Mule ESB as Application Container 1 Mule ESB as Application Container 2 Service 1 Service 2 Service 3 Service 1 Service 2 Service 3

Uninterrupted Application Updates * Allow stopping and deploying new application functionality without stopping services * Allow upgrades to a country's configuration without affecting other countries or stopping services Load Balancer Mule ESB as Application version 1.4 Mule ESB as Application version 1.4 Load Balancer Mule ESB as Application version 2.0 Mule ESB as Application version 1.4 time Load Balancer Mule ESB as Application version 2.0 Mule ESB as Application version 1.4 Load Balancer Mule ESB as Application version 2.0 Mule ESB as Application version 2.0

Database Replication Primary Cluster 0 1 ESB as app services provider Partition 0 Partition 1 DB 0 DB 1 DB 0b DB 1b

Application Deployment Load Balancer Load Balancer Mule 1 Mule 2 Mule 3 Mule 4 Mule 5 Failover JMS Queuing Active JMS Queuing Active

Application Deployment This architecture has a lower cost of operation and simplifies power consumption and administration. Application 1 Application 2 Web Service 1 Web Service 2 JBoss Mule ESB Container MQ Java 6 Java 6 Java 6 Linux Linux Linux Virtual Machine Virtual Machine Virtual Machine Multi-Core Intel or AMD Processors Simplify the architecture by having a common platform for all systems. This platform can be replicated across multiple data centers. * Virtual Machine: VMware or Xen hosted on Windows; consider Amazon EC2 as a viable, low-cost alternative * Linux: Ubuntu Server * PowerBuilder applications (end-user) migrate to JBoss + Wicket or a similar configuration * All web services are hosted by Mule ESB * The Mule ESB and JBoss servers are separate from one another * MQ clusters have a similar architecture; JBoss messaging and Websphere MQ * Java 6 as a minimum

Application Deployment App and service requests may come from the open Internet Each data center will have a cluster of two or more physical systems. Internet Each system will virtually host two or more applications/ environments deployed as described in the previous diagram. The system is designed for horizontal scalability (more traffic, more virtual or physical servers. The system has inherent fail-over built in. App Balancer Use physical load balancers; can be Linux systems or dedicated F5 balancers - separate from cluseter Services Balancer MQ Master Application Active Web Services Active Distributed Cache MQ Slave Application Active Web Services Active Distributed Cache Virtual Host (Intel, AMD) Virtual Host (Intel, AMD) Disk Disk SAN

Application Deployment Data Center Europe Data Center Japan App Cluster App Cluster Internet App Cluster App Cluster Expert Claims Mgmt Data Center US App Cluster App Cluster Each data center has an application cluster The app clusters have identical configurations; only the app itself may vary by locale Designated data center also functions as the global services processing hub; all applications talk to this cluster (e.g. Claims Management) regardless of where the app calling them is from. The global services clusters are separate physically and logically from the application clusters which may include locale-specific web services and data stores. Claims Mgmt Informix Legacy System Legacy System Legacy System

Application Deployment Primary Cluster Secondary Cluster 0 1 0 1 ESB as app services provider q u e u e ESB as app services provider Partition 0 Partition 1 Partition 0 Partition 1 DB 0 DB 1 DB 0 DB 1 DB 0b DB 1b DB 0b DB 1b Enterprise Service Bus (routing, queuing, transformation, transactions, dispatching)

Eugene Ciurana geecon@ciurana.eu - pr3d4t0r ##java, irc.freenode.net http://ciurana.eu/scalablesystems Q&A Comments? Anything else? This presentation is available from: http://ciurana.eu/geecon-2010 Twitter: ciurana