Top Ten Private Cloud Risks. Potential downtime and data loss causes



Similar documents
Maximizing Business Continuity Success

How Routine Data Center Operations Put Your HA/DR Plans at Risk

Veritas Storage Foundation High Availability for Windows by Symantec

Symantec Storage Foundation High Availability for Windows

MAKING YOUR VIRTUAL INFRASTUCTURE NON-STOP Making availability efficient with Veritas products

Solution Brief Availability and Recovery Options: Microsoft Exchange Solutions on VMware

Veritas Cluster Server from Symantec

Building Private Cloud Architectures

Confidently Virtualize Business-Critical Applications in Microsoft

SAP Solutions on VMware Business Continuance Protecting Against Unplanned Downtime

IMPROVING VMWARE DISASTER RECOVERY WITH EMC RECOVERPOINT Applied Technology

Drobo How-To Guide. Topics Drobo and vcenter SRM Basics Configuring an SRM solution Testing and executing recovery plans

Symantec and VMware: Virtualizing Business Critical Applications with Confidence WHITE PAPER

Symantec Cluster Server powered by Veritas

CHEAP DISASTER RECOVERY

EMC VPLEX FAMILY. Continuous Availability and data Mobility Within and Across Data Centers

Downtime, whether planned or unplanned,

Implementing disaster recovery solutions with IBM Storwize V7000 and VMware Site Recovery Manager

Veritas CommandCentral Disaster Recovery Advisor Release Notes 5.1

VMware Virtual Machine File System: Technical Overview and Best Practices

WHITE PAPER: HIGH CUSTOMIZE AVAILABILITY AND DISASTER RECOVERY

Mastering Disaster Recovery: Business Continuity and Virtualization Best Practices W H I T E P A P E R

VMware vsphere 5.1 Advanced Administration

Veritas InfoScale Availability

Continuous Data Protection for any Point-in-Time Recovery: Product Options for Protecting Virtual Machines or Storage Array LUNs

Using EonStor FC-host Storage Systems in VMware Infrastructure 3 and vsphere 4

VMWARE VSPHERE 5.0 WITH ESXI AND VCENTER

Optimization, Business Continuity & Disaster Recovery in Virtual Environments. Darius Spaičys, Partner Business manager Baltic s

Introduction. Setup of Exchange in a VM. VMware Infrastructure

Evolving Datacenter Architectures

How To Understand And Understand The Risks Of Configuration Drift

Citrix XenApp Server Deployment on VMware ESX at a Large Multi-National Insurance Company

Virtualizing Business-Critical Applications with Confidence

Monitoring Databases on VMware

Availability Guide for Deploying SQL Server on VMware vsphere. August 2009

VMware System, Application and Data Availability With CA ARCserve High Availability

Microsoft SharePoint 2010 on VMware Availability and Recovery Options. Microsoft SharePoint 2010 on VMware Availability and Recovery Options

How To Fix A Fault Fault Fault Management In A Vsphere 5 Vsphe5 Vsphee5 V2.5.5 (Vmfs) Vspheron 5 (Vsphere5) (Vmf5) V

Unitt Zero Data Loss Service (ZDLS) The ultimate weapon against data loss

VMware vsphere 5.0 Boot Camp

EMC SMARTS SERVER MANAGER

Virtual DR: Disaster Recovery Planning for VMware Virtualized Environments

IA L06 No-fail Failover with Disaster Recovery Advisor

Implementing a Holistic BC/DR Strategy with VMware

DR-to-the- Cloud Best Practices

Oracle Databases on VMware High Availability

WHITE PAPER Optimizing Virtual Platform Disk Performance

VMware vcenter Site Recovery Manager 5 Technical

Reducing the Cost and Complexity of Business Continuity and Disaster Recovery for

VMware Site Recovery Manager Overview Q2 2008

Part2 Hyper-V Replica and Hyper-V Recovery Manager. Datacenter Specialist

TECHNICAL PAPER. Veeam Backup & Replication with Nimble Storage

Oracle Database Solutions on VMware High Availability. Business Continuance of SAP Solutions on Vmware vsphere

Quorum DR Report. Top 4 Types of Disasters: 55% Hardware Failure 22% Human Error 18% Software Failure 5% Natural Disasters

EMC VPLEX FAMILY. Continuous Availability and Data Mobility Within and Across Data Centers

EMC Business Continuity for VMware View Enabled by EMC SRDF/S and VMware vcenter Site Recovery Manager

How To Use Vcenter Site Recovery Manager 5 With Netapp Fas/Vfs Storage System On A Vcenter Vcenter 5 Vcenter 4.5 Vcenter (Vmware Vcenter) Vcenter 2.

Five Fundamentals for Modern Data Center Availability

Contingency Planning and Disaster Recovery

Whitepaper. NexentaConnect for VMware Virtual SAN. Full Featured File services for Virtual SAN

Data Sheet: Disaster Recovery Veritas Volume Replicator by Symantec Data replication for disaster recovery

High Availability with Postgres Plus Advanced Server. An EnterpriseDB White Paper

VMware Best Practice and Integration Guide

SERVER VIRTUALIZATION IN MANUFACTURING

WHITE PAPER. Best Practices to Ensure SAP Availability. Software for Innovative Open Solutions. Abstract. What is high availability?

Increasing Storage Performance, Reducing Cost and Simplifying Management for VDI Deployments

INSIGHT. Symantec Optimizes Veritas Cluster Server for Use in VMware Environments IDC OPINION IN THIS INSIGHT SITUATION OVERVIEW. Jean S.

Dell PowerVault MD32xx Deployment Guide for VMware ESX4.1 Server

Neverfail Solutions for VMware: Continuous Availability for Mission-Critical Applications throughout the Virtual Lifecycle

High Availability with Windows Server 2012 Release Candidate

Virtual Infrastructure Implementation of High Availability and Business Continuation Solutions

Using Live Sync to Support Disaster Recovery

Technology Insight Series

Top 10 Reasons to Virtualize VMware Zimbra Collaboration Server with VMware vsphere. white PAPER

Oracle E-Business Suite Disaster Recovery Solution with VMware Site Recovery Manager and EMC CLARiiON

Veritas Cluster Server by Symantec

DeltaV Virtualization High Availability and Disaster Recovery

Virtual Infrastructure Security

vsphere Availability ESXi 5.5 vcenter Server 5.5 EN

Protecting Miscrosoft Hyper-V Environments

VMware vsphere 4.1 with ESXi and vcenter

SAN Conceptual and Design Basics

QNAP in vsphere Environment

WHITE PAPER. The Double-Edged Sword of Virtualization:

VirtualclientTechnology 2011 July

Site Recovery Manager Administration Guide

High availability and disaster recovery with Microsoft, Citrix and HP

SQL Server High Availability: After Virtualization SQL PASS Virtualization Virtual Chapter September 11, 2013

VERITAS Business Solutions. for DB2

High Availability Infrastructure for Cloud Computing

How To Create A Hypervisor Based Replication In Zerto

VERITAS Volume Replicator in an Oracle Environment

VMware Business Continuity & Disaster Recovery Solution VMware Inc. All rights reserved

Softverski definirani data centri - 2. dio

Red Hat Enterprise linux 5 Continuous Availability

Server Virtualization in Manufacturing

High-Availability Fault Tolerant Computing for Remote and Branch Offices HA/FT solutions for Cisco UCS E-Series servers and VMware vsphere

Taking the Disaster out of Disaster Recovery

Transcription:

Top Ten Private Cloud Risks Potential downtime and data loss causes

Introduction: Risk sources Enterprises routinely build Disaster Recovery and High Availability measures into their private cloud environments. So why do downtime and data loss risks still exist? The reality is that even the most robust Disaster Recovery and High Availability plans are only as good as your ability to monitor, test and maintain the environment against the plans. IT environments are dynamic in nature. Over time, changes made to the protected environment may not be reflected adequately in the recovery environment. Since manually testing for all the risks in an ongoing fashion is virtually impossible, such discrepancies can accumulate over time leading to configuration drift and associated increased risk of downtime and data loss. The risks outlined in this document are drawn from a database containing more than 5,000 known sources of risk gathered from a wide range of enterprise customers. 2

Risk framework Downtime and data loss risks can be found across three layers of a private cloud environment: The Virtual Infrastructure Layer Storage devices Networking Virtualization servers Virtualization management The Virtual Machine Layer Storage allocation Server image High availability provisioning The Disaster Recovery Layer Replication settings Storage mapping Recovery management 3

1. Infrastructure layer Storage The typical ESX cluster spans several nodes. The problem shown in this example is that one of the nodes has no SAN path redundancy like we would expect from all nodes serving the cluster. What it means is that all virtual machines currently running on this particular node have a single point of failure and may suffer reduced performance. Production This could also cause certain virtual machines to exhibit intermittent performance degradation as they move to the under-provisioned servers. The Virtual Infrastructure Layer Storage devices Networking Virtualization servers Virtualization management LUN re-signature Unmapped RDM Different multi-path configurations 4

2. Infrastructure layer Network What we see here is that one of the cluster nodes has only a single network connection to the shared or public network. The result is a single-point-of-failure and most likely poor performance for all the machines that are currently running on top of that particular server. Like in the previous example, the fact that virtual machines continually change locations makes it very difficult to pinpoint such performance fluctuation. Production The Virtual Infrastructure Layer Storage devices Networking Virtualization servers Virtualization management DNS discrepancies Incorrect routing configurations Time out of sync between nodes Different default gateway Inconsistent ESX network options 5

3. Infrastructure layer Servers Certain discrepancies in the configuration of virtualized infrastructure cluster nodes present risks to the data residing on the virtual machines running on the cluster. Production For example, when running multiple ESX versions, some nodes might be using advanced VMFS options unsupported by earlier versions, leading to potential data loss or extended downtime. The Virtual Infrastructure Layer Storage devices Networking Virtualization servers Virtualization management Hardware discrepancies Different firmware versions Inconsistent configuration options 6

4. Infrastructure layer Virtualization management A recommended best practice is to run the virtualization management application (e.g. vcenter) inside a virtual machine. Production A common mistake is configuring this virtual machine with fully automated Distributed Resource Scheduling (DRS), which means that we can t tell in advance on which particular physical node vcenter will run at any given time. If vcenter stops functioning or exhibits any performance issues, we would not know where to go in order to restart it. In a very large virtualized environment, it can take an unpleasantly long time to figure out how to revive the application. The Virtual Infrastructure Layer Storage devices Networking Virtualization servers Virtualization management Cluster nodes missing NFS datastore access Virtual datacenters misaligned with physical server/ storage licenses Missing/ expired/ incompatible licenses 7

5. VM layer Storage allocation To prevent slowing down the entire database, vendors recommend that temporary database files use the highest performance storage. However, the abstraction layer added by virtualization makes it difficult for the database administrator to know what physical storage tier is allocated to the database. In this example we see a virtual machine running an Oracle database on an ESX cluster with multi-tier RAID storage comprising high performance RAID 1 and lower performance RAID 5 datastores. While the intention was for the low-cost, low-performance RAID 5 to be used for archiving and staging, the DBA has no clear way to know which storage tier supports each VM file system. So, the temporary files of a particular database may end up on the lower performance datastore, leading to significant performance degradation. Production The Virtual Machine Layer Storage allocation Server image High availability provisioning Mixed RDM and non-rdm drivers VM data configured with different persistence nodes Mapping remote storage to VMs RDS cluster nodes with very different performance levels 8

6. VM layer Server image Over time, as the virtual environment grows larger and larger, it becomes more of a challenge to make sure that all of the virtual machines running the same application are consistently provisioned based on the same base image. Production For example, a new virtual machine may be provisioned with a different version of the operating system, which may result in security risks, performance issues, and other unexpected behavior. The Virtual Machine Layer Storage allocation Server image High availability provisioning Inconsistent patches on virtual machines Selection of different startup options Application of different locale configs Virtual machines not synced on time 9

7. VM layer High availability To achieve higher availability and performance than offered by ESX HA and DRS, it is common to use more than one virtual machines located on separate physical nodes. Production As a result of routine maintenance or unplanned outage, one of the virtual machines might fail over or relocate to a different node, which could end up being the same one running the second virtual machine for our application. With the two virtual application servers running on a single physical node, we now have a single point of failure. The Virtual Machine Layer Storage allocation Server image High availability provisioning Virtual machines not configured with HA VM team definitions not aligned with supported business applications VM config allows failover to ESX nodes lacking access to required resources 10

8. Disaster recovery layer Replication Incorrect replication settings are a common occurrence. In a simple case, a data store may not be fully replicated. Obviously, all the virtual machines that are dependent on that data store will not be able to recover. A more complex scenario could be one in which everything is replicated, but not using the same storage consistency group for all the devices on the data store. While the copy is now complete, it may very likely be corrupted, a problem which is more difficult to detect. The Disaster Recovery Layer Replication settings Storage mapping Recovery management Replication failure Data store using more than one array Misaligned VM storage 11

9. Disaster recovery layer Storage mapping Even when data is correctly replicated to the disaster recovery environment, it is still not guaranteed to be accessible in case a failover is attempted. Incorrect replica mapping is a common issue. For example, one of the designated disaster recovery hosts may not have a path configured correctly to one of the storage replicas. In this case, all the virtual machines that depend on that particular data store will not function properly in a failover attempt. The Disaster Recovery Layer Replication settings Storage mapping Recovery management Locked devices Mismatched devices SRA misconfiguration 12

10. Disaster recovery layer Recovery management Automated recovery management tools (e.g. VMWare s SRM) help streamline the disaster recovery process and are definitely recommended for any private cloud environment. With that said, it is important to note that such tools are also vulnerable to configuration changes. Keeping the configuration of the recovery manager aligned with the production configuration is an ongoing challenge. For example, we may have created a Data Store group and a Protection group that contains the VMs on that particular Data Store group. Over time, however, new VMs have been added to the Protection Group. Unless we manually refresh the recovery manager configuration, it will not be aware of these additions. In case of a failover, we may experience a partial or failed recovery. The Disaster Recovery Layer Replication settings Storage mapping Recovery management Misaligned recovery plans Missing components Unprotected non-virtualized components (e.g. DNS, NAS) 13

Veritas Risk Advisor Helps you stay on top of your environment Scans the entire infrastructure to find risks Over 5000 checks Scans take place on a scheduled basis defined by the user Agentless technology collects data unobtrusively Collects read-only configuration information from target systems Risks (what and where) are reported with steps to fix the problem Automatic report scheduling defined by the user 14

Find out how Veritas Risk Advisor can help you reduce risk Test your environment against a database of 5,000+ vulnerabilities Discover what risks lurk undetected and what to do about them DRA reports include: - Detailed assessment of risks - Resolution guidelines - Opportunities for optimization Learn more: www.veritas.com/product/business-continuity/disaster-recovery-orchestrator 15

Veritas World Headquarters 500 East Middlefield Road Mountain View, CA 94043 +1 (650) 933 1000 www.veritas.com 2015 Veritas Technologies LLC. All rights reserved. Veritas and the Veritas Logo are trademarks or registered trademarks of Veritas Technologies LLC or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners.