University of Duisburg-Essen, Institute for Experimental Mathematics Takeover Suggestion A Registrar Redundancy Handling Optimization for Reliable Server Pooling Systems Institute for Experimental Mathematics University of Duisburg-Essen, Germany dreibh@iem.uni-due.de
A Cooperation Project between... University of Duisburg-Essen Essen, Germany Hainan University Haikou, China P. 2
Table of Contents What is Reliable Server Pooling? Motivation and Application Scenarios Terminology and Protocols Server Selection Procedures Registrar Redundancy Our PlanetLab Setup Handling Registrar Failures Registrar Workload Balancing Challenge Our Solution: Takeover Suggestion Evaluation Conclusion and Outlook Thomas Dreibholz's Reliable Server Pooling Page http://tdrwww.iem.uni due.de/dreibholz/rserpool/ P. 3
Motivation Motivation of Reliable Server Pooling (RSerPool; RFCs 5351 5356): Unified, application-independent solution for service availability Not available before => Foundation of the IETF RSerPool Working Group Application Scenarios for RSerPool: Main motivation: Telephone Signalling (SS7) over IP Under discussion by the IETF: Load Balancing Voice over IP (VoIP) with SIP IP Flow Information Export (IPFIX)... and many more! Requirements for RSerPool: Lightweight (low resource requirements, e.g. embedded devices!) Real-Time (quick failover) Scalability (e.g. to large (corporate) networks, but not indefinitely!) Extensibility (e.g. by new server selection rules) Simple (automatic configuration: just turn on, and it works! ) P. 4
Reliable Server Pooling Overview (RFC 5351) Terminology: Pool Element (PE): Server Pool: Set of PEs PE ID: ID of a PE in a pool Pool Handle: Unique pool ID Handlespace: Set of pools Pool Registrar (PR) PR ID: ID of a PR Pool User (PU): Client Support for Existing Applications Proxy Pool User (PPU) Proxy Pool Element (PPE) Protocols: ASAP (Aggregate Server Access Protocol) ENRP (Endpoint Handlespace Redundancy Protocol) P. 5
Server Selection Rules (Pool Policies) What is a Pool Policy? A rule for the selection of the PEs Defined in our RFC 5356 Application of Policies Registrar: Creates PE list upon request by PU Pool User: Selection of a PE from the list Both according to the pool policies (pool-specific!) Non-Adaptive Policies Stateless: Random (RAND) Stateful: Round Robin (RR) (Default policy, must be supported) Adaptive Policies Least Used (LU) Load definition is application-specific! Round robin among multiple least-loaded PEs P. 6
Registrar Redundancy PEs can use an arbitrary PR as Home-PR (PR-H) PR-H monitors PE availability using ASAP Endpoint Keep-Alives Must be acknowledged, otherwise the PE is removed from the handlespace PR-H distributes registration information to other PRs If PR-H fails, its PEs simply choose another PR-H PUs can use an arbitrary PR for handle resolution (i.e. PE selection) If a PR is not available, just use another one... Questions: Are PR failures handled efficiently? Are there extreme cases causing problems? P. 7
The Application Model Server PE Capacity Shared among sessions (multi-tasking principle) Client Requests are generated Request Size (effort) Request Interval (frequency) Waiting queue for requests Sequential processing System Utilization RequestSize RequestInterval systemutilization= putoperatio AvgCapacity PU:PE Ratio Provisioning for certain Target Utilization, e.g. 80% P. 8
Performance Metrics Provider's Perspective Does my server capacity gain revenue? Average Utilization of server resources [%] User's Perspective How much time is needed to process my requests? Avg. Handling Speed [% of average server capacity] Depends on: Queuing Startup Server P. 9
Our Simulation Setup Components: NumPRs PRs NumPEs PEs 3*NumPEs PUs P. 10
The Impact of Registrar Redundancy Non-Adaptive Policies Adaptive Policy Request handling speed only slightly decreases with increasing number of PRs Significant change only in case of inappropriate setups: Adaptive policy and high network latency (in relation to request duration) P. 11
Handling Registrar Failures Varying PR MTBF using 5 PRs: Avg. uptime: M (neg. exp.) Avg. downtime: 100s (neg. exp.) For reasonable MTBF: pool performance hardly affected by PR failures PR failures only becomes a problem when all PRs are down for some time periods P. 12
The Challenge of Registrar Workload Balancing Scenario: One PR remains working at all time, the other PRs regularly become unavailable and start again (e.g. due to network problems) Result: after some time, the single reliable PR manages all Pes Situation persists, even when all PRs become reliable again! Solutions: Idea: using Chord (P2P algorithm, [27]) to distribute PEs among PRs => far too complex for lightweight framework! Our idea: Upon registration the PR calculates an XOR metric: PRn-ID XOR PE-ID for each PR n If there is another PR with a better metric for this PE, this PR is suggested to take over this PE ( Takeover Suggestion ) Effort: Simple computation for a few (< 10) PRs A single bit in an ENRP Update message => very easy to realize (just a few lines of code...) P. 13
Our PlanetLab Setup Components: 5 PRs, 25 PEs, 75 PUs Using RSPLIB RSerPool implementation and SCTPLIB userland SCTP P. 14
Evaluation Results (1) Request Handling Speed Endpoint Keep-Alives Dotted lines: with Takeover Suggestion PR #1 PR #5 No handling speed impact (except for extremely small PR MTBFs) Endpoint Keep-Alive traffic distributes among PRs P. 15
Evaluation Results (2) Registrations from PEs Updates from other PRs PR #5 PR #1 Dotted lines: with Takeover Suggestion PR #1 PR #5 Registration handling effort distributes among PRs Respectively, increase in updates received from other PRs P. 16
Conclusion and Outlook Conclusion RSerPool is the IETF's new standard for service availability and load distribution PR redundancy works well, but may result in uneven PE-to-PR distribution Takeover Suggestion : Using XOR metric to distribute PEs among PRs Very simple and easy to implement... but also very effective Ongoing and Future Work Contribution to standardization: Internet Draft draft-dreibholz-rserpool-enrp-takeover P. 17
Thank You for Your Attention! Any Questions? To be continued... Visit Our Project Homepage: http://tdrwww.iem.uni-due.de/dreibholz/rserpool/, dreibh@iem.uni-due.de P. 18