White Paper Achieving Zero Downtime for Apps in SQL Environments 2015 ScaleArc. All Rights Reserved.
Introduction Whether unplanned or planned, downtime disrupts business continuity. The cost of downtime comes in many forms, including: Poor user experience Decreased productivity Wasted time and IT resources Lost revenue With an increasing number of servers requiring ever longer maintenance windows, planned updates, such as patching software, can take hours. For software-as-aservice (SaaS) providers and businesses delivering online services, unplanned downtime brings even higher costs through broken service level agreements (SLAs) and lost revenue. In some cases, payments for SLA violations caused by system downtime can skyrocket to millions of dollars. For ecommerce companies, system downtime results not only in lost revenue but also in lost consumer confidence. Downtime often occurs at business peaks, coming just when companies anticipate the highest revenue such as Black Friday or Cyber Monday when heavier loads strain systems. At these times, every second of downtime means lost orders, as customers quickly abandon down or slow sites to shop elsewhere. For unplanned downtime, businesses need a strong failover architecture that can be easily deployed and implemented. This architecture must be able to instantly identify and automatically reroute traffic away from a failed database server even across data centers. Without this capability, database uptime takes a hit resulting not only in application errors but also decreased user satisfaction. For planned downtime, businesses need to reevaluate how they implement planned database maintenance. Existing approaches actually create more downtime because most organizations bring servers completely offline for maintenance, which also takes down the applications the database supports. To achieve zero downtime and avoid application error messages from affecting users, ScaleArc has: Automated and accelerated the failover process for unplanned server failure across multiple data centers Developed an innovative, multi-tier approach for taking a server out of rotation for planned or emergency maintenance ScaleArc Unique Traffic Management With applications connected directly to a database server, the minute a connection is dropped, the application gets errors. ScaleArc changes that scenario by delivering an abstraction layer for the database tier that optimizes traffic flows and provides a seamless and highly reliable method to ensure availability. Creating an abstraction layer between the client-side and the server-side connections, ScaleArc uniquely shields an application from failure, regardless of what s happening in the background. Even if a database server dies, ScaleArc will automatically and intelligently route traffic to where it needs to go. In the case of unplanned or planned downtime, ScaleArc s ability to handle client transactions and ScaleArc enables auto-failover that avoids application errors, elimination user disruption. 2015 ScaleArc. All Rights Reserved. 1
route queries to appropriate server connections offers tremendous value. By separating the direct relationship between clients and server connections, ScaleArc can instantly and automatically redirect traffic flows as needed and ensure appropriate load distribution. How ScaleArc Prevents Downtime During Unplanned Failure Failover capability that is limited to servers within a single data center leaves organizations significantly more exposed to service interruptions and outages caused by unexpected system failure. Yet most high availability (HA) solutions today, such as SQL Server clustering and Linux Heartbeat, cannot easily implement failover across multiple data centers. These solutions rely on IP address migration between the nodes and the cluster and must operate in the same Layer 2/3 domain. This prohibits their ability to provide cross-data center failover. Any cross-data center failover must be done at either the DNS or the TCP layer, which would involve making the DNS cluster aware or modifying the application to include failover capability. Because ScaleArc is a Layer 7 routing solution and can route traffic at the SQL layer, it completely eliminates this problem and minimizes downtime through seamless and automated cross-data center failover. ScaleArc understands where a query needs to go and can determine which server it should be routed to. With a comprehensive view into the topology of servers across multiple data centers, ScaleArc can identify which is the right server for reads and which is the right server for writes. In SQL Server 2012, ScaleArc replaces the Virtual Network Name (VNN)/Availability Group Listener (AGL) as the primary destination for SQL connections. VNN has significant limitations since it redirects read intent connections directly to the secondary server and does not act as an aggregation end point. This redirection can lead to increased downtime and application errors during failover as both the primary being failed and the secondary being promoted will end up dropping their connections. ScaleArc s ability to abstract all SQL connections, including read-intent connections, ensures seamless failover across data centers to the application stack in SQL Server 2012 environments. How ScaleArc Allows Planned Maintenance With No Downtime Businesses have planned and emergency maintenance requirements that require them to take servers out of rotation. Yet because maintenance windows can operationally affect users, organizations need to do maintenance very quickly. By enabling planned or emergency maintenance with no downtime, ScaleArc allows organizations to avoid system interruptions and protect critical business operations. The strategy for taking a server out of rotation during maintenance varies depending on the use case being implemented. Top use cases for planned maintenance windows include: White Paper How ScaleArc Achieves Zero Downtime for SQL Even if a database server dies, ScaleArc will automatically and intelligently route traffic to where it needs to go. 2015 ScaleArc. All Rights Reserved. 2
Applying software/security updates Performing storage maintenance Performing a backup Implementing master/slave failover Diagnosing potential performance issues Each use case has its own requirements, and the strategy for taking the server out of rotation varies for each. In every instance, businesses must consider the right approach to ensure HA, while simultaneously shielding application users from errors and minimizing system downtime impact. ScaleArc lets organizations choose how gradually or abruptly they want to reduce server load by picking the right method for each kind of failover. ScaleArc offers strategies for both online and offline servers to help organizations minimize downtime depending on the particular use case. If a RAID failure has degraded storage performance, organizations can reduce the amount of load going to the server while concurrently supporting storage maintenance. 1. Online Server Maintenance ScaleArc gives organizations two ways to perform maintenance on a server while the server is still online: load balancing bias and reduced server connections. Both load balancing bias and reduced server connections allow organizations to work on a server while it s still online and simultaneously reduce the load. If a RAID failure has degraded storage performance, organizations can reduce the amount of load going to the server while concurrently supporting storage maintenance. Load Balancing Bias A Slow Bleed Off Load balancing bias is the slowest, simplest, and smoothest way of reducing traffic to a server. With load balancing bias, organizations can gradually reduce load to a particular server to perform maintenance that can be done while the server remains online. Organizations can use the server s spare capacity to perform operational processes, such as backups or performance diagnostics gracefully slowing down a database to identify problems, such as a read failure. Reduced Server Connections A More Abrupt Bleed Off With reduced server connections, organizations can immediately and significantly decrease load to a particular server. While similar to load balancing bias, this technique creates a much quicker bleed off and a faster, more dramatic load reduction to a server. Without impacting traffic, this approach can be used for storage maintenance or other back-end maintenance that doesn t require the server to be taken offline but does require a significant and fast load reduction. Using reduced server connections, organizations can instantly lower the number of queries being sent to a server because the number of connections to execute those queries has been minimized. 2015 ScaleArc. All Rights Reserved. 3
2. Offline Server Maintenance Marking a Server Offline Marking a server offline immediately shifts load from a particular server to another server available in the cluster, triggering a complete and temporary removal of a server. When organizations have operations that require changes to a server, marking a server offline will quickly and easily take the server down so maintenance can be performed. Operations requiring offline server maintenance might include hardware replacements, software updates or upgrading RAM in a server, as well as master/slave failover moving the primary to become the secondary and moving the secondary to become the primary. Together with its ability to mark a server offline, ScaleArc has a unique queuing system to ensure the most minimal failover interruption possible. ScaleArc s queue integrated within the ScaleArc software automatically and temporarily holds traffic during switchover to a secondary server, instantly releasing the traffic as soon as the second server becomes available. ScaleArc Keeps Business Data Flowing Database downtime can account for hours, even days, of lost productivity, impacting business profitability, as well as user confidence. By minimizing downtime from unexpected failure or planned maintenance, ScaleArc helps organizations keep their critical data flowing. Businesses realize more value maintaining productivity, ensuring user satisfaction, and maximizing revenue potential. To learn more about how ScaleArc can help your organization reduce unplanned or planned downtime, visit our failover overview. ScaleArc s queue integrated within the ScaleArc software automatically and temporarily holds traffic during switchover to a secondary server, instantly releasing the traffic as soon as the second server becomes available. 2901 Tasman Drive, Suite 205 Santa Clara, CA 95054 Phone: 1-408-780-2040 Fax: 1-408-427-3748 www.scalearc.com ScaleArc is the leading provider of database load balancing software. The ScaleArc software inserts transparently between applications and databases, creating an agile data tier that provides continuous availability and increased performance for all apps. With ScaleArc, enterprises also gain instant database scalability and a new level of real-time visibility for their application environments, both on prem and in the cloud. Learn more about ScaleArc, our customers, and our partners at www.scalearc.com. 2015 ScaleArc. All Rights Reserved. ScaleArc and the ScaleArc logo are trademarks or registered trademarks of ScaleArc in the United States and other countries. All brand names, product names, or trademarks belong to their respective holders. 01/08/15