SCALABILITY AND AVAILABILITY



Similar documents
LinuxWorld Conference & Expo Server Farms and XML Web Services

High Availability Solutions for the MariaDB and MySQL Database

A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES

Tushar Joshi Turtle Networks Ltd

Building a Highly Available and Scalable Web Farm

Techniques for implementing & running robust and reliable DB-centric Grid Applications

Ecomm Enterprise High Availability Solution. Ecomm Enterprise High Availability Solution (EEHAS) Page 1 of 7

Introduction 1 Performance on Hosted Server 1. Benchmarks 2. System Requirements 7 Load Balancing 7

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

High Availability and Clustering

Informix Dynamic Server May Availability Solutions with Informix Dynamic Server 11

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

Distribution One Server Requirements

CS514: Intermediate Course in Computer Systems

Tier Architectures. Kathleen Durant CS 3200

BUSINESS CONTINUITY AND DISASTER RECOVERY FOR ORACLE 11g

Availability and Disaster Recovery: Basic Principles

CHAPTER 2 MODELLING FOR DISTRIBUTED NETWORK SYSTEMS: THE CLIENT- SERVER MODEL

High Availability and Disaster Recovery for Exchange Servers Through a Mailbox Replication Approach

High Performance Cluster Support for NLB on Window

Availability Digest. Redundant Load Balancing for High Availability July 2013

High Availability Essentials

HyperQ Remote Office White Paper

Windows Server Performance Monitoring

MySQL High-Availability and Scale-Out architectures

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

Chapter 10: Scalability

SQL Server Storage Best Practice Discussion Dell EqualLogic

Westek Technology Snapshot and HA iscsi Replication Suite

Input / Ouput devices. I/O Chapter 8. Goals & Constraints. Measures of Performance. Anatomy of a Disk Drive. Introduction - 8.1

Distributed System Principles

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

Technology Insight Series

INUVIKA TECHNICAL GUIDE

Protect Data... in the Cloud

Blackboard Managed Hosting SM Disaster Recovery Planning Document

SAN Conceptual and Design Basics

Comparing TCO for Mission Critical Linux and NonStop

EMC Backup Storage Solutions: The Value of EMC Disk Library with TSM

Highly Available Service Environments Introduction

Tools Page 1 of 13 ON PROGRAM TRANSLATION. A priori, we have two translation mechanisms available:

Cloud Based Application Architectures using Smart Computing

Proactive, Resource-Aware, Tunable Real-time Fault-tolerant Middleware

Deployment Topologies

EMC VPLEX FAMILY. Continuous Availability and data Mobility Within and Across Data Centers

Lab 5 Explicit Proxy Performance, Load Balancing & Redundancy

High Availability with Postgres Plus Advanced Server. An EnterpriseDB White Paper

Top 10 Reasons why MySQL Experts Switch to SchoonerSQL - Solving the common problems users face with MySQL

STORAGE CENTER. The Industry s Only SAN with Automated Tiered Storage STORAGE CENTER

Building Reliable, Scalable AR System Solutions. High-Availability. White Paper

Distributed Systems LEEC (2005/06 2º Sem.)

DeltaV Virtualization High Availability and Disaster Recovery

A Link Load Balancing Solution for Multi-Homed Networks

The Microsoft Large Mailbox Vision

In Memory Accelerator for MongoDB

Fault Tolerance in the Internet: Servers and Routers

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

Accelerating Wordpress for Pagerank and Profit

Architecting For Failure Why Cloud Architecture is Different! Michael Stiefel

SERVER CLUSTERING TECHNOLOGY & CONCEPT

Everything You Need to Know About Network Failover

Microsoft SharePoint 2010 on VMware Availability and Recovery Options. Microsoft SharePoint 2010 on VMware Availability and Recovery Options

High Availability Solutions with MySQL

Red Hat Cluster Suite

Understanding Neo4j Scalability

Load-Balanced Merak Mail Server

Best Practices for Managing Virtualized Environments

VERITAS Volume Management Technologies for Windows

Affordable Remote Data Replication

Windows Server Failover Clustering April 2010

DISTRIBUTED AND PARALLELL DATABASE

Multiple Public IPs (virtual service IPs) are supported either to cover multiple network segments or to increase network performance.

FAQ: BroadLink Multi-homing Load Balancers

The functionality and advantages of a high-availability file server system

High Availability Solutions for MySQL. Lenz Grimmer DrupalCon 2008, Szeged, Hungary

Chapter 2 TOPOLOGY SELECTION. SYS-ED/ Computer Education Techniques, Inc.

EMC DOCUMENTUM MANAGING DISTRIBUTED ACCESS

Barracuda Load Balancer Online Demo Guide

Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall.

Principles and characteristics of distributed systems and environments

High Availability for Citrix XenApp

Managing and Maintaining Windows Server 2008 Servers

Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB

Architectures Haute-Dispo Joffrey MICHAÏE Consultant MySQL

Large-Scale Web Applications

Creating Web Farms with Linux (Linux High Availability and Scalability)

High Availability Database Solutions. for PostgreSQL & Postgres Plus


Availability Guide for Deploying SQL Server on VMware vsphere. August 2009

Achieving High Availability & Rapid Disaster Recovery in a Microsoft Exchange IP SAN April 2006

CHAPTER 1 INTRODUCTION

Web DNS Peer-to-peer systems (file sharing, CDNs, cycle sharing)

Kingston Communications Virtualisation Platforms

Transcription:

SCALABILITY AND AVAILABILITY Real Systems must be Scalable fast enough to handle the expected load and grow easily when the load grows Available available enough of the time Scalable Scale-up increase hardware so it is bigger and faster o Increasing the capacity of the system o No need for load balancing just use a bigger box o Runs into limits eventually o Could provide less availability what happens if there is a failure, and there is no redundancy o Could be easier to manage o Does not guarantee increased performance o Easiest solution, if it works and you have lots of money Scale-out systems working together to handle the load (server farms, clusters) o Have multiple systems working together, programmers must be able to handle this o Add more boxes at every level the critical parts of the system Web servers for handling user interface Application serves for running business logic Database servers tricky to implement this o Spread load across the boxes Load balancing at every level Partitioning or replication for database Impact on application design Impact on system management o Another constraint may be directly behind the current constraint and therefore that will need fixing as well o Database may be able to add more machines but complex o Major impacts on application design and administration, management complexity increases considerably Implication for application design, especially in the management of state Availability Includes maintenance the key is redundancy Goal is 100% availability, 24x7 operations, including time for maintenance Redundancy is the key to availability, having no single point of failure, spare everything on hand How much o 99% - 87.6 hours per year o 99.9& - 8.76 hours per year o 99.99% - 0.876 hours per year Need to consider operations as well Robert Whitaker 1

o o Maintenance, software upgrades, backups, application changes Not just faults and recovery time Scalability Growth in performance Response time, instant is good Need to specify an acceptable response time. They need to be consistent. Response times usually vary between different transaction types. Different classes have different times. Typically if a response time is constant, then users accept it and don t notice it, however if the response time fluctuates widely then users will be unhappy. Performance How fast is the system o Not the same as scalability but related o Measured by response time and throughput How scalable is the system o Is concerned with the upper limits of the system o How big can it grow, and how does it grow Response Time What delay does the user see Response times vary with the complexity of a transaction. These include fast read only transactions which are fast, slower update transactions, and any which require opening a connection to the database is slow Throughput How many transactions can be handled in some period of time o Transactions per second o A measure of overall capacity o Inverse of response time There are standard benchmarks for measuring this Capacity of the system Will increase until some resource limit is hit o Adding more clients just increases the response time o Run out of processor, disk bandwidth, network bandwidth o Some resources overload badly Contention for shared resources Ethernet network performance degrades. Log file you only want the one file open on disk, because you don t want to have to move the heads often, thus giving max disk performance. Want as few head movements as possible System Capacity How many clients can you support Robert Whitaker 2

o Need to specify an acceptable response time o Plot response time v number of clients Great if you can run benchmarks reason for prototyping and proving proposed architectures before leaping into full scale implementation Every system has a constraining resource and can be extended until you reach the constraining resource. Load Balancing Balancing client bindings across servers or processes o Needed for stateful systems o Static allocation of client and server Balancing requests across server systems or processes o Dynamically allocating requests to servers o Normally only done for stateless systems CORBA Implementation o Clients calls on name server to find the location of a suitable server name server is the terminology for object directory o Name server can spread client objects across multiple servers often round robin o Client is bound to a server and stays bound forever this can lead to performance problems if server loads become unbalanced Dynamically balance load across servers requests from a client can go to any server Requests Dynamically routed often used for web server farms Routing decisions has to be fast router in main processing path Applications normally stateless Static o Dynamic o Bind a client to a server or process on a server, need that binding for stateful systems Balance requests across a number of servers to spread the load uniformly across those services. If you push the button twice you are likely to go across to different servers. But each server may have a static binding to an application server COBRA Name server s job is to distribute client requests across different instances. It does something similar to round robin. Once a client is bound to a server, it is bound forever. The problem with binding a client to a server forever is that we may under-utilise servers, because many clients are bound to one busy server, and other less busy ones sit around idle Name Server Server processes call name server when they come up advertising what services they are offering Robert Whitaker 3

Clients call name server to find the location of a server process it is up to the name server to match clients to servers Clients call server process to create objects Can perform dynamic load balancing with stateful servers Clients can throw away server objects and get new ones every now and then, this is implemented in the application code or middleware Or you can perform object replication in the middleware o Have copies of the same object on all servers o Replication of changes made over all servers o Clients have references to all copies of the object Dynamic Stateful Save the state somewhere and restore the state if needed Replicate the stateful object over different servers Web Logic 2 servers, A and B. we have full replication of the statful object across A and B On client request, both servers get the object created, and the client can choose which server to use. On commit of changes, both server objects are updated The reason for this is because if machine B dies, machine A takes over Dynamic Load Balancing Equally across all servers Requests can go to any server Build web servers this was Need routing requests IP Sprayer one IP which pushes out connections over N ports. Must be reliable cause all connections must go through it and controls the reliability of your system and application. Network load balances, splits requests based on IP The request routing needs to be fast and reliable, as it is the main request path stateless Web Server Farms Are highly scalable Is a type of cluster Web Applications are normally stateless o Next request can go to any web server o State comes from client or database Just need to be able to spread the requests across the machines. Clusters A group of independent computers acting like a single systems o Shared disks each server shares access to the disks in the system Robert Whitaker 4

o Single IP address o Single set of services o Fail over to other members of the cluster o Each server in the cluster knows the status of other servers Group of independent autonomous computers acting as a single machine. The machine inside the cluster are sharing some resources Machine inside cluster takeover from failed machines, transparent failovers Some do load sharing within the cluster Improves scalability by adding more resources/boxes Address scalability add more boxes to the cluster and the replication or sharing of storage Address availability allows you to add or remove boxes from the cluster for maintenance and upgrades Can be used as one element of a highly available system Heartbeats between machines will allow them to monitor one another. If A sees B has died it can take over. Availability how often it fails plus how often it is available. Clusters allow you to take down a machine and the cluster continuous to work especially for maintenance purposes. Harder to scale state stores Threaded Servers Allows to spread the load of individual processors A process may have a lot of requests while an identical one has none, it would be better if we can spread the load evenly. We can have process load balancing or processor load balancing, CORBA uses process instance load balancing to be system independent, because all systems use processes, so its implementation is portable Modern approaches uses thread pools. A client can be bound to one process and have its requests handled by a thread, so even busy processes can handle the requests No need for load balancing within a single system o Multithreaded server process thread pool servicing requests o All objects live in a single process space o Any requests can be picked up by any thread Scaling Data Stores Much harder to do because of ACID The data stores hold state Solution: buy more hardware Replication: make multiple copies, useful for high contention data and is not always updated but is shared a lot. The trick is to change something and replicate the change to all other copies Partitioning: if there is one database that contains all the customers of a business, it might be useful to partition these customers and distribute the partitions across Robert Whitaker 5

different physical locations. This will help to reduce contention by distributing the partitions, and redirecting requests to different databases. The problem is what happens if we want to get all the customers. We can use a partitioned view, by specifying all the database partitions and asking it to return a view for all customers. To obtain scalability at each partitions physical location we can use a cluster. Availability through Redundancy Redundancy through the addition of spare equipment Active standby the redundant system must monitor everything that the system is redundant for, and ensures that it can jump in immediately when needed. It must always be updated Passive system sits in wait, and jumps in if needed. They need to play catch up getting back to the state the failed system was in before it died For an active standby system, we have a copy of the database. All request to the active system is sent to the standby one as well Fragility Large distributed synchronous systems are not roust. With such tight coupling if a remote system suddenly dies because of failure, you have to wait for a response which may be forever Asynchronous is better as if the system is down it will eventually get there We rely on guarantees, the middleware usually makes these. Problem is with committed transaction, what do we do when we find out later that the transaction failed Availability and Scalability Often a question of application design Stateful v stateless o What happens if a server fails o Can requests go to any server Synchronous method calls or asynchronous messaging o Reduce dependency between components o Failure tolerant designs Manageability decisions to consider Robert Whitaker 6