ARC VIEW OCTOBER 23, 2014 Stratus High-Availability Solutions Designed to Virtually Eliminate Downtime By Craig Resnick Keywords Virtualization, High Availability, Fault Tolerant, Critical Process, Real-time Summary Virtualization, a computing approach that decouples hardware and software, is rapidly gaining traction in the traditionally conservative automation and control industry. With its roots in the information technology (IT) world, virtualization was initially met Server virtualization offers significant with skepticism for industrial applications. But benefits over the entire lifecycle of an this has changed, largely driven by end user demands to reduce costs and make more efficient use automation system. Since these benefits clearly outweigh any perceived or actual risks associated with the of their existing computing resources. technology, ARC Advisory Group believes that the trend will continue Today, most major automation suppliers support and, in fact, will accelerate as control virtualization in one form or another, predominantly for PC and/or server virtualization. With system architectures evolve. virtualization, a single computer can host multiple instances of the same or different software applications as if each was running in its own dedicated computer, regardless of the specific operating systems employed. This offers significant benefits over the lifecycle of an automation system, including reduced hardware and associated support costs, reduced space requirements, reduced electricity requirements (both to operate the computers and for the associated HVAC), and increased scalability. Since these benefits clearly outweigh any perceived or actual risks associated with the technology, ARC Advisory Group believes that the trend will continue and, in fact, will accelerate as control system architectures evolve. Stratus Technologies is one company that offers high-availability, faulttolerant infrastructure-based solutions that support virtualization. VISION, EXPERIENCE, ANSWERS FOR INDUSTRY
ARC View, Page 2 Virtualization Rapidly Gaining Traction in Automation Virtualization of computing devices started in corporate IT departments and has now moved into the plants control system architectures. Historically, control and automation departments were slow to adapt to the concept of virtualization, which both requires an additional layer of specialized software and concentrates -- rather than distributes -- processing power and applications. This creates acute concerns over the impact of failures in the underlying hardware on critical plant applications; since companies are putting more eggs in one basket as they consolidate the number of servers. As acceptance for virtualization technology increases, more applications are Initially, virtualization was used solely at the operator interface level, where one running on fewer servers, making it more critical server with several thin client terminals that the server does not fail. replaced several dedicated PC's. Once this approach proved itself, many In many cases, virtualization was thrust upon automation departments and automation suppliers virtualized applications were implemented successfully at the operations management level. alike through corporate-level pressures to save costs at the operating companies. Initially, virtualization was used solely at the operator interface level, where one server with several thin client terminals replaced several dedicated PCs. Once this approach proved itself, many virtualized applications were implemented successfully at the operations management level. Today, it is common to see historians, engineering functions, and optimization packages all running on one physical device (server) rather than dedicated devices. Virtualization Facilitates Upgrades and Expansions Reducing the numbers of PC's and servers used for automation and supervisory applications can help decrease both capital and operating costs by reducing the number of computers that need to be maintained and managed, reducing overall control room space and energy requirements. Virtualization also saves costs during upgrades and expansions because the existing server can usually accommodate additional virtual machines (VMs). This allows new applications to be added without incremental hardware costs and without having to take the hardware offline, which could result in significant and costly production interruptions. Virtual machines are highly portable, allowing software maintenance personnel to migrate them to different physical machines. This enables
ARC View, Page 3 maintenance to be performed on the hardware without impacting production operations and loads to be balanced more efficiently across the physical infrastructure. Also, if a physical server fails, its VM can be restarted, but there may be loss of data and the restart times can vary depending on the applications hosted on the virtual machine. Overcoming Concerns about Availability Virtualization has become a standard practice due to the advantages it offers such as agility, efficiency, and scalability compared to traditional physical infrastructures. As more business-critical applications are virtualized, however, concerns about availability increase. Outages are costing companies more money each year, with the average cost of an hour of downtime for large companies in the hundreds of thousands of dollars. While some applications can tolerate brief outages, downtime of critical processes, with the associated risk of data, transaction, or production losses, is simply unacceptable. And with multiple virtual machines (VMs) running on a single physical host, one hardware failure can have widespread business impact. In today s always-on world, ensuring availability of virtualized critical processes is essential. Most approaches to minimizing downtime employ server clusters and failover mechanisms that restart VMs on another host in the event of a hardware or operating system fault. However, the recovery process not only takes time but also implies that damage has been incurred. Ideally, the systems on which virtualized critical processes run should prevent downtime in the first place by working through system faults, thus avoiding process disruption entirely. Initial concerns about virtualized applications, Initial concerns about virtualized specifically availability, have dissipated due to applications, specifically availability, advances provided by technologies such as Stratus ftserver Fault Tolerant (FT) systems. This has have dissipated due to advances provided by technologies such as Stratus ftserver Fault Tolerant (FT) systems. given many companies the confidence to virtualize business-critical applications, such as Microsoft Exchange, Oracle Database, and SAP Enterprise Resource Planning.
ARC View, Page 4 Solution Implementation: Complexity vs. Simplicity Establishing a high-availability environment based on clustering is a complex and costly endeavor. All require at least a two-node cluster to enable failover protection. In an environment comprised of many hosts, each node belonging to the cluster must be identified and appropriate network connectivity between them established. This solution also requires an external storage array. It is then necessary to configure a number of settings to determine how the cluster will behave and ensure that adequate resources are available in the event of a failover. In contrast, Stratus ftserver systems are designed to provide outof-the-box fault tolerance with no additional hardware, network, or software requirements. According to the company, Stratus ftserver systems are built on cost-effective industry-standard hardware powered by Intel multi-core processors. Installation is faster and easier, offering full support for all standard VM products with no Stratus ftserver Systems additional configuration work or system modifications. A typical high availability (HA) or FT cluster could take several days to install, configure, and validate considering the requirements to provide multiple servers, dedicated cluster networking, and shared storage. A Stratus ftserver system, in contrast, can typically be installed in a few hours. Total Cost Evaluation When evaluating the cost of ensuring availability, it s important to look at the whole picture. If an HA/FT cluster requires multiple servers, a management console, a high-availability network, and external storage; lifecycle costs rise. With the average life of a server being three to five years, recurring hardware expenses are likely to rise, along with downtime to perform migrations and upgrades. In addition, the cluster may require multiple software licenses for each server and the management console. According to the company, Stratus ftserver systems are designed to offer a complete, integrated solution for ensuring continuous application availability. They require no additional servers, networking, or storage. Only one license is needed for the operating system and application compared to multiple licenses in a cluster scenario, lowering total costs. In addition, ftserver systems average eight years in service, enabling IT organizations to stretch budgets and minimize upgrade cycles.
ARC View, Page 5 Management and support are also important cost considerations. Clusters require extensive hands-on care, potentially necessitating the expense of full-time administrative staff. Cluster capacity, policies, resources, and software changes must all be managed and tested to validate proper operation. Administration of ftserver systems is minimal with The longer life cycle of an ftserver its resilient hardware and built-in service technology. reduces business disruptions and helps avoid IT lifecycle rip and Stratus automated service technology makes ftserver systems self-managing, requiring minimal hands-on attention replace every few years. and less cost. When service is required, Stratus performs remote software updates and proactively delivers hardware components on-site that can be plugged into the ftserver system while it runs, with no specialized IT skills required. According to the company, the longer life cycle of an ftserver (usually seven to eight years) also reduces business disruptions and helps avoid IT lifecycle rip and replace every few years. Ease of servicing avoids thousands of dollars in upgrade expenses associated with clustered infrastructures. Availability Report Card To maximize availability of process critical virtualized applications, ftserver systems offer a different approach. According to Stratus, clusters often rely on system failover, while ftserver systems help prevent downtime by riding through the fault with no impact to the running applications. Clusters do not protect the host server or hypervisor against downtime or performance degradation. The restart that a cluster would initiate could take minutes or even hours for large systems. Any data uncommitted is lost during application or server crashes and restarts. Stratus ftserver systems are designed to deliver fault tolerance and integrate with standard manufacturing applications to avoid downtime instead of recovering from it. According to the company, this provides 99.999+ percent availability to protect process applications from host faults that would otherwise result in VM failures. In-flight data for process and batch applications is fully protected from loss. In addition, alternative fault tolerant solutions have limitations, as some only support a single virtual processor core, while critical process applications typically require multicore symmetric multiprocessing (SMP). For example, Microsoft recommends four to twelve dual-socket cores for Exchange and four to eight cores for SharePoint and SQL Server. Oracle recommends six to twelve cores. Stratus ftserver systems provide full support for multicore SMP with no performance impact to achieve fault tolerance.
ARC View, Page 6 Conclusion By now it should be apparent that much of the hardware in a control system environment has been gradually replaced by software through virtualization and other techniques. The question is: "Will the trend continue and what hardware will be the next to go"? ARC believes the Virtualization appears to be here virtualization trend will continue. It is part of the larger to stay in control and trend in the automation and control industry in which less automation architectures and its physical hardware is required; simply because software use will only increase over time. functionality is replacing physical devices across the control architecture. Virtualization appears to be here to stay in control and automation architectures and its use will only increase over time. This is because, in most plant applications, the significant benefits across an automation system s lifecycle and throughout the organization outweigh the risks, real or perceived. With organizations relying more on virtualized critical process applications, ensuring availability is a top priority. To strengthen availability, Stratus Technologies offers solutions designed to keep applications running continuously, enabling rapid deployment of 24/7 infrastructures without changes to applications. These software, platform, and service solutions are designed to prevent downtime before it occurs and help ensure uninterrupted performance of critical process operations. Its ftserver systems offer fault-tolerant solutions, continuously taking proactive measures that help prevent downtime from occurring and protect against data loss. Stratus ftserver is an out-of-the box turnkey solution that automatically achieves reliable downtime-prevention with no external dependencies. The company stresses that its ftserver systems offer plug-and-play simplicity, single software image, and zero data loss to provide the highest possible availability for a lower total investment in time and money. For additional details on how Stratus ftserver systems prevent downtime and ensure continuous availability of critical process virtualized applications, visit: www.stratus.com/products/platforms/ftserversystems For further information or to provide feedback on this article, please contact your account manager or the authors at cresnick@arcweb.com. ARC Views are published and copyrighted by ARC Advisory Group. The information is proprietary to ARC and no part of it may be reproduced without prior permission from ARC.