Red Hat Enterprise Virtualization - KVM-based infrastructure services at BNL

Similar documents

Server Virtualization and Consolidation

RED HAT ENTERPRISE VIRTUALIZATION

Disaster Recovery Infrastructure

Preparation Guide. How to prepare your environment for an OnApp Cloud v3.0 (beta) deployment.

IOS110. Virtualization 5/27/2014 1

Enabling Technologies for Distributed and Cloud Computing

Servervirualisierung mit Citrix XenServer

Addendum No. 1 to Packet No Enterprise Data Storage Solution and Strategy for the Ingham County MIS Department

Red Hat enterprise virtualization 3.0 feature comparison

RED HAT ENTERPRISE VIRTUALIZATION FOR SERVERS: COMPETITIVE FEATURES

Enabling Technologies for Distributed Computing

Qualcomm Achieves Significant Cost Savings and Improved Performance with Red Hat Enterprise Virtualization

Deep Dive on SimpliVity s OmniStack A Technical Whitepaper

Brocade and EMC Solution for Microsoft Hyper-V and SharePoint Clusters

How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm (

Server and Storage Virtualization with IP Storage. David Dale, NetApp

HRG Assessment: Stratus everrun Enterprise

RED HAT ENTERPRISE VIRTUALIZATION 3.0

DELL. Dell Microsoft Windows Server 2008 Hyper-V TM Reference Architecture VIRTUALIZATION SOLUTIONS ENGINEERING

Virtual server management: Top tips on managing storage in virtual server environments

FOR SERVERS 2.2: FEATURE matrix

Windows Server 2008 R2 Hyper-V Live Migration

Best Practices for Virtualised SharePoint

Developing a dynamic, real-time IT infrastructure with Red Hat integrated virtualization

Intro to Virtualization

Restricted Document. Pulsant Technical Specification

Dell Virtualization Solution for Microsoft SQL Server 2012 using PowerEdge R820

Balancing CPU, Storage

Microsoft Hyper-V chose a Primary Server Virtualization Platform

Building the Virtual Information Infrastructure

Real World Cloud Infrastructure with Red Hat Enterprise Virtualization and Red Hat Network Satellite

DVS Enterprise. Reference Architecture. VMware Horizon View Reference

System and Storage Virtualization For ios (AS/400) Environment

Windows Server 2008 R2 Hyper-V Live Migration

ENABLING VIRTUALIZED GRIDS WITH ORACLE AND NETAPP

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

Migrating to ESXi: How To

The Art of Virtualization with Free Software

Server Virtualization A Game-Changer For SMB Customers

An Oracle White Paper November Oracle Real Application Clusters One Node: The Always On Single-Instance Database

An Oracle White Paper August Oracle VM 3: Server Pool Deployment Planning Considerations for Scalability and Availability

YOUR STRATEGIC VIRTUALIZATION ALTERNATIVE. Greg Lissy Director, Red Hat Virtualization Business. James Rankin Senior Solutions Architect

Parallels Plesk Automation

Solution Brief Availability and Recovery Options: Microsoft Exchange Solutions on VMware

The future is in the management tools. Profoss 22/01/2008

Pivot3 Reference Architecture for VMware View Version 1.03

Data Centers and Cloud Computing

Oracle Database Scalability in VMware ESX VMware ESX 3.5

NETGEAR SMB Storage Line Update and ReadyNAS 2100 Introduction

With Red Hat Enterprise Virtualization, you can: Take advantage of existing people skills and investments

MICROSOFT HYPER-V SCALABILITY WITH EMC SYMMETRIX VMAX

I/O Virtualization The Next Virtualization Frontier

Cloud Optimize Your IT

Global Headquarters: 5 Speen Street Framingham, MA USA P F

RFP-MM Enterprise Storage Addendum 1

Open-E Data Storage Software and Intel Modular Server a certified virtualization solution

Parallels Server 4 Bare Metal

Virtualization. Michael Tsai 2015/06/08

Expert Reference Series of White Papers. Visions of My Datacenter Virtualized

Citrix XenServer Product Frequently Asked Questions

Deploying Red Hat Enterprise Virtualization On Tintri VMstore Systems Best Practices Guide

Cloud Sure - Virtual Machines

Lab Validation Report

Cloud Computing. Chapter 8 Virtualization

To find a more cost-effective virtualization technology with better support and reliability

Hyper-V vs ESX at the datacenter

SAN TECHNICAL - DETAILS/ SPECIFICATIONS

The virtualization of SAP environments to accommodate standardization and easier management is gaining momentum in data centers.

Dell Compellent Storage Center SAN & VMware View 1,000 Desktop Reference Architecture. Dell Compellent Product Specialist Team

ORACLE OPS CENTER: PROVISIONING AND PATCH AUTOMATION PACK

VIRTUALIZATION 101. Brainstorm Conference 2013 PRESENTER INTRODUCTIONS

How To Make A Virtual Machine Aware Of A Network On A Physical Server

Bosch Video Management System High Availability with Hyper-V

Data Communications Hardware Data Communications Storage and Backup Data Communications Servers

UN 4013 V - Virtual Tape Libraries solutions update...

Virtual Compute Appliance Frequently Asked Questions

Best Practice of Server Virtualization Using Qsan SAN Storage System. F300Q / F400Q / F600Q Series P300Q / P400Q / P500Q / P600Q Series

Protect SQL Server 2012 AlwaysOn Availability Group with Hitachi Application Protector

How To Make Your Database More Efficient By Virtualizing It On A Server

Private cloud computing advances

How To Create A Virtual Desktop In Gibidr

Dell High Availability Solutions Guide for Microsoft Hyper-V

SUSE Linux Enterprise 10 SP2: Virtualization Technology Support

PROSPHERE: DEPLOYMENT IN A VITUALIZED ENVIRONMENT

The Benefits of Virtualization for Your DR Plan

Table of Contents. Server Virtualization Peer Review cameron : modified, cameron

Intel Cloud Builder Guide: Cloud Design and Deployment on Intel Platforms

Nutanix NOS 4.0 vs. Scale Computing HC3

White Paper. Recording Server Virtualization

EMC Integrated Infrastructure for VMware

Support Guide Comprehensive Hosting at Nuvolat Datacenter

PHD Virtual Backup for Hyper-V

NET ACCESS VOICE PRIVATE CLOUD

NETWORK FUNCTIONS VIRTUALIZATION. The Top Five Virtualization Mistakes

Transcription:

Red Hat Enterprise Virtualization - KVM-based infrastructure services at Presented at NLIT, June 16, 2011 Vail, Colorado David Cortijo Brookhaven National Laboratory dcortijo@bnl.gov Notice: This presentation was authored by employees of Brookhaven National Laboratory, under Contract No. DE-AC02-98CH10886 with the U.S. Department of Energy. The United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this presentation, or allow others to do so, for United States Government purposes.

Historical perspective of infrastructure management at Decision points for Virtualization platform Hardware and software requirements for RHEV implementation Brief overview of RHEV features current and future The story so far... Path forward at Caveats and potential problems 2

Historical perspective Like many datacenters, was using bare-metal servers to provide nearly all services Due to power, space, and cooling constraints within the datacenter, potential growth of service offerings was slowed immensely Hardware purchase delays caused new service implementations to take several months before preliminary testing could really begin 3

Historical perspective some details Dozens of lightweight services running on bare-metal servers Multiple services often shared hardware out of necessity (cost, space, etc) Some examples: 14 DNS servers, several also serving DHCP 3 dedicated DHCP servers Web servers hosting dozens of virtual hosts some internal only and others with external access on the same machine Hardware was found to be underutilized to an untenable degree 10 machines were doing the work that one could do in some cases 4

The path to Virtualization Decision was made to virtualize in order to address the multitude of concerns and constraints presented Initial work done with Xen 3.0.3 embedded into Red Hat Enterprise Linux 5, with Linux HA/Heartbeat and custom scripts to provide redundancy Few resources beyond hardware and manpower existed no money for licensing at the time Many problems in particular lack of VLAN support caused unnecessary physical server sprawl Xen project was scrapped in favor of a better supported solution 5

Decision points on Virtualization Various factors contributed to the decision to move to RHEV Cost Best support of Linux platforms in particular RHEL Visibility into host not looking for bare-metal hypervisor implementations as a requirement Live migration of VMs 802.1q/VLAN tag support Cost proved to be the largest determinant, but not the only one 6

Why RHEV? Cost was 1/6 that of VMWare in our sample implementation pricing RHEL Server acting as host platform included unlimited guest licensing for RHEL 802.1q support worked out of the box without needing complicated configuration Storage Live Migration (the only VMWare feature that RHEV did/does not have) was not viewed as a strict requirement more on this later 7

Glossary of RHEV-related Terms KVM Kernel-based Virtual Machine Host Physical machine that VMs run on Data Center Set of Hosts with shared storage and network definitions Cluster Subset of a Data Center; must share identical networks between Hosts LVM Logical Volume Manager Storage Domains Analogous to LVM Volume Groups; set of disks shared to RHEV Hosts Live Migration moving a running VM from one Host to another without interrupting service 8

Glossary of RHEV-related Terms Bare-Metal Hypervisor lightweight OS that allows hardware to run VMs without a full-blown OS installation RHEV-H Red Hat's Bare-Metal Hypervisor, which is a stripped down RHEL implementation RHEV-M RHEV Manager software, resides on a separate server vdsmd Virtual Desktop Server Manager daemon, which allows RHEV-M to manage and monitor VMs and send commands to Hosts Virtual Guest, or Guest another term for a Virtual Machine 9

Data Center configuration Note the type constraint 10

Cluster configuration Cluster is bound to a Data Center on creation cannot be changed later 11

Host configuration Hosts can be moved between Clusters/ Data Centers, but must have the appropriate Logical Networks or they will not work properly 12

Hardware and Software Requirements RHEV requires at least 2 Hosts per Cluster (and Data Center) to properly operate Hosts must have AMD-V or Intel VT hardware virtualization support and Intel 64 or AMD64 CPU extensions All members of the Data Center must have access to the same shared storage via iscsi, Fiber Channel, or NFS Sufficient RAM and CPU to run virtual machines Network Connectivity to all networks assigned to the Cluster that the Host is a member of Dedicated RHEV-M machine (currently required to be a Windows Server platform RHEL in next major release) 13

RHEV features Hosts can be integrated into RHEV with remote installation direct from the interface Storage domains integrate seamlessly into RHEL hosts using LVM Quick, push-button migration of Guests between Hosts in the Cluster Guests can be easily installed/kickstarted through shared ISOs directly from the UI 14

RHEV features Automatic balancing of load within Clusters based on parameters defined by the admin Ability to snapshot or move Guest data of a downed VM to different storage domains cannot be done live Ability to fence unresponsive hosts and restart VMs automatically Data Centers/Clusters scale horizontally very easily 15

implementation the story so far Clusters/Data Centers Two RHEV Data Centers one within a load balanced environment and another on the main campus network Three clusters one within the load balancer, main campus split into two Each cluster has 2-3 Hosts Access to multiple networks via dual ethernet and 802.1q trunks 16

implementation the story so far Hosts Seven Dell m610 blades - Dual Intel Xeon E5530 w/ Hyper -Threading enabled (16 simultaneous threads) - 48 GB RAM - Dual port Qlogic HBA expansion card in each blade Blades currently reside in a single m1000e chassis second is being prepared for production M1000e contains Brocade 4424 FC switch for SAN connectivity 17

implementation the story so far Storage Four RAID units two Tier 1 and two Tier 2 connected via redundant Fiber Channel SAN Roughly 7 TB of storage shared out via multiple storage domains to the appropriate RHEV Data Centers Storage for Guests that provide redundant services (i.e. paired DNS servers) lives in Storage Domains provided by different physical RAID devices (manual determination/implementation) 18

Brief look at redundant storage connectivity: 19

Impact on service offerings Many core services are now virtual and provide far better reliability than ever before. For example: DNS DHCP SSH gateways Second-tier mail relays NTP Problematic webservers and other services have been properly segregated (internal vs. external access) Guests for testing or new builds are now available within minutes, rather than hunting down 20 hardware

Performance and Reliability thus far With the exception of a few bugs early on, RHEV has proven to be an extremely stable platform In the event of Host failure or loss of connectivity to Host, all VMs set for High Availability are restarted on other available hosts in the cluster in under 5 minutes (from time of failure, not detection) Live migration between Hosts takes moments, even for those with significant I/O, allowing manual and automatic balancing of load as well as Host maintenance to be transparent to users Storage has proven to be our largest vulnerability 21

Power, space, and cooling impact Virtualization has thus far allowed us to eliminate dozens of servers from the datacenter Over 60 bare-metal servers decommissioned Eliminated over 2 full racks of servers and supporting hardware Net power and cooling consumption dropped dramatically Peak power change > 30 kw Approximate heat delta > 100,000 BTU/hr reduction 20% reduction of peak load on PDUs 22

Storage woes Two storage failures in recent months First one was a critical failure of a RAID volume - Unavoidable - Recovery from backup was as per standard operating procedures Second left RAID unit in a semi-working state - Device was unmanageable - Forced choice between availability of VMs or maintenance of array hardware Both issues could have been addressed with better storage management in RHEV 23

More details on storage woes Were live snapshots available in RHEV, recovery from initial failure may have been cut to a fraction of the time Snapshots in RHEV are done to the same disk as the Guest's storage not helpful in the event of failure of the underlying disk Even given the problems created by the first failure, time to recover was no more than that of a bare-metal server's reinstall/recovery from backup 24

Some more woes In the second instance, live migration functions would have allowed us to move running VMs off of the array, rather than a series of short outages for a large number of systems Due to switch resets, Dell Brocade 4424 took over as principal switch, which caused path failures Lack of hardware redundancy on the blade servers and internal switches made full restoration take much longer than expected 25

Path forward at Additional Dell m1000e chassis to balance blade load 4 Additional blades to expand existing clusters to 3+ nodes Upgrade storage for increased performance and to spread out impact of potential failures Additional in-chassis Fiber Channel switches to protect against failure and allow for switch maintenance 26

Path forward at Increase reliability by merging the 2-Cluster Data Center into a single Cluster Longer term work Requires collaboration with Networking team to rearchitect service-specific subnets End state of 2 Clusters (1/Data Center) with 4+ Hosts Increase Virtualized service offerings to Lab-wide community Replace bare-metal service offerings and support of disparate machines Move more services into the central datacenter 27

Path forward at Investigate VDI (Virtual Desktop Infrastructure) solutions Software comes built in to RHEV Additional licensing costs are minimal Continue to identify targets for virtual servers, eliminating unnecessary hardware wherever possible Make better use of power management and load balancing functionality within RHEV itself (not currently used to its full potential) 28

Caveats and potential problems Storage Extremely important to understand how your underlying storage is configured both physically and logically to identify potential issues Redundant storage fabric and physical devices is the largest single concern we've had 29

Caveats and potential problems Networking Work with your Networking experts to ensure all layers that can be are fully redundant Where possible, connect members of the same Cluster to different physical switches to ensure availability in case of failure Even if you think you might not need them at first, start from day one with 802.1q trunks flipping the switch later is quite the headache Decide which network you will use for management (Logical Network called rhevm ) before you begin 30

Caveats and potential problems Host hardware and software Everything relies on your Hosts make sure they're under warranty, you have spare parts, etc Make sure your Host systems are patched should go without saying If possible, segregate them physically - Avoids the Fiber switch problems we've seen - Allows for chassis, switch, or rack maintenance where necessary 31

In closing... RHEV has provided huge benefits in terms of reliability and reducing administrative overhead on hardware overall, as well as cleaning up and greening the datacenter Storage concerns are biggest Achilles Heel, but impact will seem exaggerated given the 18 months since implementation saw few problems If we had to start from scratch, RHEV would still be the product of choice we'd simply invest more in storage and SAN support from the beginning 32

Additional Resources http://www.redhat.com/virtualization/rhev/server/ Red Hat's page on RHEV for servers Contains links to various whitepapers https://access.redhat.com/knowledge/docs/red_hat_en terprise_virtualization_for_servers/ RHEV Documentation http://rcritical.blogspot.com/ RHEV related blog Lots of useful notes, tips, and tricks for RHEV 33

In closing... RHEV has provided huge benefits in terms of reliability and reducing administrative overhead on hardware overall, as well as cleaning up and greening the datacenter Storage concerns are biggest Achilles Heel, but impact will seem exaggerated given the 18 months since implementation saw few problems If we had to start from scratch, RHEV would still be the product of choice we'd simply invest more in storage and SAN support from the beginning 34

Q & A (time permitting) If you have any further questions, feel free to contact me at: David Cortijo ITD Unix Services Brookhaven National Laboratory dcortijo@bnl.gov 631-344-2053 35