High Availability and Clustering AdvOSS-HA is a software application that enables High Availability and Clustering; a critical requirement for any carrier grade solution. It implements multiple redundancy models and maintains the system s availability through redundancy, automated IP takeover, and database replication. Note: In case a solution deploys SAN based storage, database replication is no longer required. Database replication is only needed for non-san storage scenarios. In such cases, only application level clustering and redundancy is required. Clustering Each physical machine or server running application instances can potentially have its own individual hot-standby server that may be active on its own, providing services and also acting as a standby to some other primary server, or can be configured as a passive standby to an active server, thus forming a clustered environment. For example, if one service is spread over three components, each running on its own server, each server may have its own standby to form a six machine cluster. Each machine in the cluster could be active, or just a passive standby, waiting for its master to fail and takeover automatically. Automatic IP Takeover The core technique is based on a clustered HA solution with automated monitoring of services, systems and network interfaces. The system provides 1+1 redundancy with heartbeat messages continuously exchanged between a primary Application server and its redundant peer. The redundant peer is a complete, working application instance with its own database (in case of Non- SAN) with its own set of processes and modules. It may remain active or dormant while the primary server is available. The IP address of the primary server is a Virtual IP address instead of a physical one. It is moved immediately to the redundant peer if the heartbeat mechanism detects that the primary server or its network interface is unavailable. The redundant peer automatically takes over the Virtual IP and brings the services up and running on its hardware automatically and seamlessly. The time for IP takeover is configurable and is usually in order of a few seconds.
Additionally, there is local monitoring application on each server that continuously monitors the service s component processes and immediately restarts it if any of them is found to be crashed and sends an alert to the Network Management System. Redundancy Models AdvOSS-HA supports three redundancy configurations as given below: 1 + 1 Redundancy This is the basic redundancy model that provides full redundancy where each primary server has exactly one backup server, and each backup server is acting as a hot-standby for exactly one primary server. The advantage of this model is that it provides complete redundancy with maximum probability of availability in any kind of physical machine failures, as long as its hotstandby peer remains available during its own down time. The disadvantage is cost since it requires as many physical machines for hot-standby peers as the number of primary machines. Load sharing The stand-by machines can be used to handle live traffic or perform application processing on their own separate IP Addresses and assume the IP address of the primary machine when the primary goes down. This way they can be used in a load sharing and hot-standby combo mode. However, to realize this scenario, clients need to be able to distribute traffic via some load balancing algorithm between the two machines. For web-services based applications, this can be easily achieved via DNS Round Robin mechanisms. This Document is Property Of AdvOSS. Page 2
This Document is Property Of AdvOSS. Page 3
N + 1 Redundancy This is a low cost redundancy model where one single server is acting as a hot-standby node for multiple primary machines. If any of the primary machines goes down, this single hot-standby automatically takes over its IP address of that machine and the traffic continues. If multiple machines simultaneously go down, this machine becomes a multi-homed IP machine that takes the traffic of all machines that are suffering down time. The advantage of this model is low cost since one single physical machine is required to backup several servers. The disadvantage is the single point of failure created in the network in case of multiple machines going down simultaneously. Also, if the backup machine is not powerful enough, it may be overloaded in scenarios of simultaneous, multiple primary failures. This Document is Property Of AdvOSS. Page 4
M + N Redundancy This is the most advanced model where N servers are acting as hot-standby for M primary servers. The advantage is that it enables striking a good balance between full 1+ 1 redundancy and a single point of failure in an N + 1 model. It is thus possible to balance the economic considerations with the amount of redundancy required. In this case, whenever a primary server goes down, one of the standby servers takes over as its hot-standby. All standby peers are running the heartbeat protocol with all primary servers. When a primary is down, one of the standby peers takes over making a decision dynamically and then informing all others that it is now acting as hot-standby for that primary. Others standby servers note this fact and do not attempt to automatically take over that primary server s IP. This Document is Property Of AdvOSS. Page 5
This Document is Property Of AdvOSS. Page 6