VMware- Customer Support Day November 16, 2010



Similar documents
How To Set Up A Virtual Network On Vsphere (Vsphere) On A 2Nd Generation Vmkernel (Vklan) On An Ipv5 Vklan (Vmklan)

Nutanix Tech Note. VMware vsphere Networking on Nutanix

VMware vsphere-6.0 Administration Training

vsphere Networking vsphere 5.5 ESXi 5.5 vcenter Server 5.5 EN

Network Troubleshooting & Configuration in vsphere VMware Inc. All rights reserved

vsphere Networking vsphere 6.0 ESXi 6.0 vcenter Server 6.0 EN

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

VMware vsphere 4.1 with ESXi and vcenter

VMware vsphere 5.1 Advanced Administration

VMware vsphere 5.0 Boot Camp

How To Install Vsphere On An Ecx 4 On A Hyperconverged Powerline On A Microsoft Vspheon Vsphee 4 On An Ubuntu Vspheron V2.2.5 On A Powerline

Expert Reference Series of White Papers. VMware vsphere Distributed Switches

How To Use Ecx In A Data Center

vsphere Networking ESXi 5.0 vcenter Server 5.0 EN

Set Up a VM-Series Firewall on an ESXi Server

ESXi Configuration Guide

Setup for Failover Clustering and Microsoft Cluster Service

VMWARE VSPHERE 5.0 WITH ESXI AND VCENTER

Storage Protocol Comparison White Paper TECHNICAL MARKETING DOCUMENTATION

Microsoft SQL Server 2012 on Cisco UCS with iscsi-based Storage Access in VMware ESX Virtualization Environment: Performance Study

TGL VMware Presentation. Guangzhou Macau Hong Kong Shanghai Beijing

vsphere Private Cloud RAZR s Edge Virtualization and Private Cloud Administration

Vmware VSphere 6.0 Private Cloud Administration

Configuration Maximums

Feature Comparison. Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V

VirtualclientTechnology 2011 July

VMware vsphere: Fast Track [V5.0]

Set Up a VM-Series Firewall on an ESXi Server

Cloud Optimize Your IT

Configuring iscsi Multipath

VMware vsphere Design. 2nd Edition

Cisco Nexus 1000V Virtual Ethernet Module Software Installation Guide, Release 4.0(4)SV1(1)

What s New with VMware Virtual Infrastructure

ESX Configuration Guide

E-SPIN's Virtualization Management, System Administration Technical Training with VMware vsphere Enterprise (7 Day)

How To Set Up An Iscsi Isci On An Isci Vsphere 5 On An Hp P4000 Lefthand Sano On A Vspheron On A Powerpoint Vspheon On An Ipc Vsphee 5

Getting the Most Out of Virtualization of Your Progress OpenEdge Environment. Libor Laubacher Principal Technical Support Engineer 8.10.

VI Performance Monitoring

VMware Certified Professional 5 Data Center Virtualization (VCP5-DCV) Exam

Configuration Maximums

VMware View Design Guidelines. Russel Wilkinson, Enterprise Desktop Solutions Specialist, VMware

VMware vsphere Replication Administration

VMware Virtual SAN 6.2 Network Design Guide

Migrating to ESXi: How To

Using EonStor FC-host Storage Systems in VMware Infrastructure 3 and vsphere 4

Khóa học dành cho các kỹ sư hệ thống, quản trị hệ thống, kỹ sư vận hành cho các hệ thống ảo hóa ESXi, ESX và vcenter Server

Configuration Maximums VMware vsphere 4.1

Enterprise. ESXi in the. VMware ESX and. Planning Deployment of. Virtualization Servers. Edward L. Haletky

Bosch Video Management System High Availability with Hyper-V

VMware Virtual Networking Concepts I N F O R M A T I O N G U I D E

ESX Server 3 Configuration Guide Update 2 and later for ESX Server 3.5 and VirtualCenter 2.5

vrealize Operations Manager Customization and Administration Guide

EMC Business Continuity for VMware View Enabled by EMC SRDF/S and VMware vcenter Site Recovery Manager

vsphere Monitoring and Performance

hp ProLiant network adapter teaming

VXLAN: Scaling Data Center Capacity. White Paper

Monitoring Databases on VMware

Multipathing Configuration for Software iscsi Using Port Binding

Building the Virtual Information Infrastructure

Drobo How-To Guide. Deploy Drobo iscsi Storage with VMware vsphere Virtualization

Setup for Failover Clustering and Microsoft Cluster Service

Performance Analysis Methods ESX Server 3

vsphere 6.0 Advantages Over Hyper-V

vsphere Monitoring and Performance

Table of Contents. vsphere 4 Suite 24. Chapter Format and Conventions 10. Why You Need Virtualization 15 Types. Why vsphere. Onward, Through the Fog!

How To Use Vsphere On Windows Server 2012 (Vsphere) Vsphervisor Vsphereserver Vspheer51 (Vse) Vse.Org (Vserve) Vspehere 5.1 (V

MODULE 3 VIRTUALIZED DATA CENTER COMPUTE

Installing and Administering VMware vsphere Update Manager

Maximum vsphere. Tips, How-Tos,and Best Practices for. Working with VMware vsphere 4. Eric Siebert. Simon Seagrave. Tokyo.

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

Best Practices for Running VMware vsphere on Network-Attached Storage (NAS) TECHNICAL MARKETING DOCUMENTATION V 2.0/JANUARY 2013

PassTest. Bessere Qualität, bessere Dienstleistungen!

BEST PRACTICES GUIDE: VMware on Nimble Storage

Technical Paper. Moving SAS Applications from a Physical to a Virtual VMware Environment

vsphere Monitoring and Performance

IP SAN Best Practices

Chapter 14 Virtual Machines

Best Practices when implementing VMware vsphere in a Dell EqualLogic PS Series SAN Environment

Install and Configure an ESXi 5.1 Host

vsphere Monitoring and Performance

VMware Virtual Infrastucture From the Virtualized to the Automated Data Center

VMware Virtual Desktop Infrastructure Planning for EMC Celerra Best Practices Planning

VMware for Bosch VMS. en Software Manual

VMware vsphere Reference Architecture for Small Medium Business

Study Guide. Professional vsphere 4. VCP VMware Certified. (ExamVCP4IO) Robert Schmidt. IVIC GratAf Hill

VMware vsphere: Install, Configure, Manage [V5.0]

Advanced VMware Training

WHITE PAPER 1

Configuration Maximums VMware vsphere 4.0

Where IT perceptions are reality. Test Report. OCe14000 Performance. Featuring Emulex OCe14102 Network Adapters Emulex XE100 Offload Engine

Dell Compellent Storage Center SAN & VMware View 1,000 Desktop Reference Architecture. Dell Compellent Product Specialist Team

Unleash the Performance of vsphere 5.1 with 16Gb Fibre Channel

Red Hat enterprise virtualization 3.0 feature comparison

Index C, D. Background Intelligent Transfer Service (BITS), 174, 191

VMware vsphere 6 Nyheter

Bosch Video Management System High availability with VMware

Setup for Failover Clustering and Microsoft Cluster Service

Virtual SAN Design and Deployment Guide

Certification Guide. The Official VCP5. vmware* press. Bill Ferguson. Upper Saddle River, NJ Boston Indianapolis San Francisco.

VMware ESXi 3.5 update 2

Transcription:

VMware- Customer Support Day November 16, 2010 2009 VMware Inc. All rights reserved

Agenda 9:30 AM - Welcome/Kick-Off Bob Good, Manager, Systems Engineering 9:40 AM - Support Engagement Laura Ortman, Director, Global Support Services (GSS) 10:00 AM - Storage Best Practices Ken Kemp, Escalation Engineer 11:00 AM - Keynote VMware Virtualization and Cloud Management Doug Huber, Director, Systems Engineering 12:00 PM - Lunch/Q&A with the experts (Group A) /VMware Express Private Viewing (Group B) 1:00 PM - Lunch/Q&A with the experts (Group B) / VMware Express Private Viewing (Group A) 2:00 PM - View 4.5 Overview/Network Best Practices David Garcia, Release Readiness Manager 3:15 PM - Break Interactive Session 3:30 PM - vsphere Performance Best Practices Ken Kemp, Escalation Engineer 4:15 PM - Wrap Up/Raffle Drawing 2

Storage Best Practices Ken Kemp Escalation Engineer, Global Support Services 2009 VMware Inc. All rights reserved

Agenda Performance SCSI Reservations Performance Monitoring esxtop Common Storage Issues Snapshot LUN s Virtual Machine Snapshot iscsi Multi Pathing All Paths Dead (APD) 4

Performance Disk subsystem bottlenecks cause more performance problems than CPU or RAM deficiencies Your disk subsystem is considered to be performing poorly if it is experiencing: Average read and write latencies greater than 20 milliseconds Latency spikes greater than 50 milliseconds that last for more than a few seconds 5

Performance vs. Capacity Performance vs. Capacity comes into play at two main levels Physical drive size Hard disk performance doesn t scale with drive size In most cases the larger the drive the lower the performance. LUN size Larger LUNs increase the number of VM s, which can lead to contention on that particular LUN LUN size is often times related to physical drive size which can compound performance problems 6

Performance Physical Drive Size You need 1 TB of space for an application 2 x 500GB 15K RPM SAS drives = ~300 IOPS Capacity needs satisfied, Performance low 8 x 146GB 15K RPM SAS drives = ~1168 IOPS Capacity needs satisfied, Performance high 7

SCSI Reservations Why? SCSI Reservations when an initiator requests/reserves exclusive use of a target(lun) VMFS is a clustered file system Uses SCSI reservations to protect metadata To preserve the integrity of VMFS in multi host deployments One host has complete access to the LUN exclusively A reboot or release command will clear the reservation The virtual machine monitor users SCSI-2 reservations 8

SCSI Reservations What causes SCSI Reservations? When a VMDK is created, deleted, placed in REDO mode, has a snapshot (delta) file, is migrated (reservations from the source ESX and from the target ESX) or when the VM is suspended (Since there is a suspend file written). When VMDK is created via a template, we get SCSI reservations on the source and target When a template is created from a VMDK, SCSI reservation is generated 9

SCSI Reservation Best Practice Simplify/verify deployments so that virtual machines do not span more than one LUN This will ensure SCSI reservations do not impact more than one LUN Determine if any operations are occurring on a LUN on which you want to perform another operation Snapshots VMotion Template Deployment Use a single ESX server as your deployment server to limit/prevent conflicts with other ESX servers attempting to perform similar operations 10

SCSI Reservation Best Practice - Continued Inside vcenter, limit access to actions that initiate reservations to administrators who understand the effects of reservations to control WHO can perform such operations Schedule virtual machine reboots so that only one LUN is impacted at any given time A power on and power off are considered separate operations and both with create a reservations VMotion Use care when scheduling backups. Consult the backup provider best practices information Use care when scheduling Anti Virus scans and updates 11

SCSI Reservation Monitoring Monitoring /var/log/vmkernel for: 24/0 0x0 0x0 0x0 SYNC CR messages In a shared environment like ESX there will be some SCSI reservations. This is normal. But when you see 100 s of them it s not normal. Check for Virtual Machines with snapshots Check for HP management agents still running the storage agent Check LUN presentation for Host mode settings Call VMware support to dig into it further 12

Storage Performance Monitoring Ken Kemp Escalation Engineer, Global Support Services 2009 VMware Inc. All rights reserved

esxtop 14

esxtop - Continued DAVG = Raw response time from the device KAVG = Amount of time spent in the VMkernel, aka. virtualization overhead GAVG = Response time that would be perceived by virtual machines D + K = G 15

esxtop - Continued 16

esxtop - Continued 17

esxtop - Continued What are correct values for these response times? As with all things revolving around performance, it is subjective Obviously the lower these numbers are the better ESX will continue to function with nearly any response time, however how well it functions is another issue Any command that is not acknowledged by the SAN within 5000ms (5 seconds) will be aborted. This is where perceived disk performance takes a sharp dive 18

Common Storage Issues Ken Kemp Escalation Engineer, Global Support Services 2009 VMware Inc. All rights reserved

Snapshot LUNs How a LUN is detected as a snapshot in ESX? When an ESX 3.x server finds a VMFS-3 LUN, it compares the SCSI_DiskID information returned from the storage array with the SCSI_DiskID information stored in the LVM Header. If the two IDs do not match, the VMFS-3 volume is not mounted. A VMFS volume on ESX can be detected as a snapshot for a number of reasons: LUN ID change SCSI version supported by array changed (firmware upgrade) Identifier type changed Unit Serial Number vs NAA ID 20

Snapshot LUNs - Continued Resignaturing Methods ESX 3.5 Enable LVM Resignaturing on the first ESX host Configuration > Advanced Settings > LVM > LVM.EnableResignaturing to 1. ESX 4 Single Volume Resignaturing Configuration > Storage > Add Storage > Disk / LUN Select Volume to Resignature > Select Mount, or Resignature 21

Virtual Machine Snapshots What is a Virtual Machine Snapshot? A snapshot captures the entire state of the virtual machine at the time you take the snapshot. This includes: Memory state The contents of the virtual machine s memory. Settings state The virtual machine settings. Disk state The state of all the virtual machine s virtual disks. 22

Virtual Machine Snapshot - Continued Common issues: Snapshots filling up a Data Store Offline commit Clone VM Parent has changed. Contact VMware Support No Snapshots Found Create a new snapshot, then commit. 23

ESX4 iscsi Multi-pathing ESX 4, Set Up Multi-pathing for Software iscsi Prerequisites: Two or more NICs. Unique vswtich. Supported iscsi array. ESX 4.0 or higher 24

ESX4 iscsi Multi-pathing - Continued Using the vsphere CLI, connect the software iscsi initiator to the iscsi VMkernel ports. Repeat this command for each port. esxcli swiscsi nic add -n <port_name> -d <vmhba> Verify that the ports were added to the software iscsi initiator by running the following command: esxcli swiscsi nic list -d <vmhba> Use the vsphere Client to rescan the software iscsi initiator. 25

ESX4 iscsi Multi-pathing - Continued This example shows how to connect the software iscsi initiator vmhba33 to VMkernel ports vmk1 and vmk2. Connect vmhba33 to vmk1: esxcli swiscsi nic add -n vmk1 -d vmhba33 Connect vmhba33 to vmk2: esxcli swiscsi nic add -n vmk2 -d vmhba33 Verify vmhba33 configuration: esxcli swiscsi nic list -d vmhba33 26

All Paths Dead (APD) The Issue You want to remove a LUN from a vsphere 4 cluster You move or Storage vmotion the VMs off the datastore who is being removed (otherwise, the VMs would hard crash if you just yank out the datastore) After removing the LUN, VMs on OTHER datastores would become unavailable (not crashing, but becoming periodically unavailable on the network) the ESX logs would show a series of errors starting with NMP 27

All Paths Dead - Continued Workaround 1 In the vsphere client, vacate the VMs from the datastore being removed (migrate or Storage vmotion) In the vsphere client, remove the Datastore In the vsphere client, remove the storage device Only then, in your array management tool remove the LUN from the host. In the vsphere client, rescan the bus. Workaround 2 Only available in ESX/ESXi 4 U1 esxcfg-advcfg -s 1 /VMFS3/FailVolumeOpenIfAPD 28

4.1 Storage Additions Storage I/O Control which allows us to prioritize I/O from Virtual Machines residing on different ESX servers but using the same shared VMFS volume. New I/O statistics, including NFS throughput and latency counters. vstorage API for Array Integration (VAAI) which allow the offloading of certain storage operations such as cloning and zeroing operations from the host to the array. 29

Questions 2009 VMware Inc. All rights reserved

VMware View 4.5 Overview David Garcia Jr - Global Support Services 2009 VMware Inc. All rights reserved

Agenda View (Overview) User Experience (Highlights) Performance & Scalability (Tiered Storage, View Composer) Management (View Manager) 32

VDI deployment scope Hypervisor Performance VMware View Performance Server and Virtualization stack Network Infrastructure Storage Infrastructure Storage Infrastructure Performance VIEW SERVER View Server and Remote Clients vcenter SERVER vcenter Performance Client Performance 33

View 4.5 Architecture overview Support for vsphere 4.1 and vcenter 4.1 - Delivers integration with the most widely-deployed desktop virtualization platform in the industry. Takes advantage of optimizations for View virtual desktops. Lowest Cost Reference Architectures - VMware has worked with partners such as Dell, HP, Cisco, NetApp, and EMC to provide prescriptive reference architectures to enable you to deploy a scalable and cost-effective desktop virtualization solution. View Client with Local Mode 34

View 4.5 Product highlights Full Windows 7 Support View Manager Enhancements Increasing Scale and Efficiency System and User Diagnostics Extensibility PCoIP Updates: Smart Card Support View Client with Local Mode (aka Offline Support) Support for vsphere 4.1 35

Flexible client access from multiple devices Native Windows Client Thin- Client Support Native Mac Client (RDP) Thick clients or refurbished PCs Now with Local Mode Broad industry support Mac OS 10.5+ NEW 36

Single sign on to virtual desktop and apps Single Sign On Authentication to Virtual Desktop Simplified Sign-on Windows Username/Password Smart Cards/Proximity Cards Client Based (MAC Address) USB connected biometric devices Integration with MS AD No Domain change, schema change, password change Supports Tap and Go Functionality Integrates with SSO Vendors Imprivata, Sentillion, Juniper, etc Connection Server 37

Web download portal Enhanced capability to manage distribution of full View Windows Client including PCoIP, ThinPrint and USB redirection features Ability to distribute current and legacy versions of View Client Broker URL automatically passed to Windows client upon launch Experimental Java based Mac and Linux Web Access no longer supported (use installable Mac Client in View 4 and View Open Client for Linux) 38

Value propositions of local desktops For IT Extend View benefits to mobile users with laptops Enable Bring Your Own PC (BYOPC) programs for employees & contractors Extend View benefits to remote/branch offices with poor/unreliable networks Guest VM 1 Guest VM 2 For End Users Mobility check out VM to local laptop for offline usage View Client with Local Mode Windows Disaster Recovery VM replicated to datacenter Flexibility BYOPC and personal desktop productivity 39

High level features of local desktops in 2010 High Level Features Run anywhere Broad hardware support Encrypted and secure Data centralization & control High quality user experience Reasonable CAPEX costs Disaster recovery options Single Image Management w/view View in 2010 Details After initial checkout, desktop can be used at home or on the road w/o network connectivity. Works with almost any modern laptop today. AES Encryption of Desktop and centrally managed policies to control access and usage. Admin can pull all data back up to datacenter on demand. Support for Win7 Aeroglass Effects, DirectX 9 w/3d, distortion-free sound & multimedia. Up & running in with a single ESX box & local storage! Can schedule data replication to server for rapid, seamless recovery from hardware loss or failure. Works off same management infrastructure & images as rest of View deployment. 40

View 4.5 major management feature highlights Admin Features High perf GUI Role based Admin Event DB, Dashboard View Power CLI extension Up-to 10000 Desktops Composer Enhancements Sysprep support Fast refresh Persistent Disk Management Simplified Sign-on Smart-card/Proximity card Client (MAC/device ID), support of Kiosk mode ThinApp Integration App repo scanning Pool/Desktop ThinApp assignment Storage Optimization Tiered storage Disposable disk/local swap file redirection VM on local storage 41

Core broker: Performance & scalability 10,000 VM Pod (5 connection servers + 2 standby) Federated Pool Management Connection server instance in a cluster will be responsible for VM operations on VMs belonging to the same pool Reduced locking/synchronization overhead Enhanced tracker w/ caching Reduced extra reloading from ADAM Datastore Refresh UI with 5,000 objects in seconds! 42

View Composer improvements overview Customization/Provisioning Sysprep support Refresh, Recompose and Rebalance for Floating Pool Storage Performance and Optimization Tiered support Optimization Disposable disk and Local swap file redirect Allow creation of linked-clones on local storage Management Full Management of Persistent Disk (formerly known as UDD) 43

View Composer: Tiered storage Allow master VM replica to reside in a separate datastore Use high performance storage to boost performance (e.g. reboot, virus scan) 44

View Composer: Other storage optimization Local swap file redirect Not reducing storage but allow the use of cheap local storage for individual VM swap file Allow creation of linked-clones using local data stores Wizard will not filter out local data stores for use of VM cloning Allow use of cheap local storage for non-persistent pool VMs 45

View Composer: Customization/provisioning Sysprep support Sysprep helps resolve the SID management issue: a new SID will be generated for each cloned VM The Three R s Refresh Recompose Rebalance 46

View Composer: Enhanced management functions Persistent Disk (formerly known as UDD) Management Detach/Migrate/Archive/Reattach Managed as first class object Garbage collection scripts Remove one or more linked-clone VM(s) by name(s) from View, SVI, VC, and AD 47

Administration improvements in 2010 Provides Increased Management Efficiency: Monitoring, Diagnostics and Supportability Features Scalable Admin UI in Flex Role-based Administration System and End-User Troubleshooting Monitoring Dashboard Diagnostics Supportability Reporting and Auditing Enablement Events View Management Pack for SCOM 48

Scalable admin UI Based on Adobe Flex Rich application feel Scalability Easy navigation Cross-Platform 49

Role-based administration Delegated administration Flexible Roles Helpdesk, etc Custom roles LDAP-based access control on folders 50

System and end-user troubleshooting: Dashboard Surface key information to administrators Drill-down as needed Locate root cause System health status View components vcenter components Status of desktops Status of client-hosted endpoints Datastore usage VMs on storage LUN 51

Reporting and auditing enablement: Events Formally defined events Events have a unique well defined identifier Standard attributes include module, user, desktop, machine Provides a unified view across View components No more needing to review logs on each broker, agent! Managed with a configurable database Accessible with: VMware View Administrator Direct access (SQL) for other reporting tools Powershell Vdmadmin provides textual reports (csv or xml) 52

View management pack for SCOM 53

Links & Resources Documentation, Release Notes http://www.vmware.com/support/pubs/view_pubs.html VMware View 4.5 Release Notes VMware View Architecture Planning Guide VMware View Administrator's Guide VMware View Installation Guide VMware View Upgrade Guide VMware View Integration Guide Technical Papers http://www.vmware.com/resources/techresources/cat/91,156 VMware View Optimization Guide for Windows 7 VMware Ensynch 09/27/2010 Vblock Powered Solutions for VMware View VMware Cisco EMC 09/09/2010 Virtual Desktop Sizing Guide with VMware View 4.0 and VMware vsphere 4.0 Update1 Mainline 05/21/2010 Application Presentation to VMware View Desktops with Citrix XenApp VMware 05/20/2010 PCoIP Display Protocol: Information and Scenario-Based Network Sizing Guide VMware 05/20/2010 Location Awareness in VMware View 4 VMware 06/15/2010 VMware View 4 & VMware ThinApp Integration Guide VMware 01/19/2010 54 Anti-Virus Deployment for VMware View VMware 01/13/2010

Questions 2009 VMware Inc. All rights reserved

vsphere Networking Best Practices David Garcia Jr - Global Support Services 2009 VMware Inc. All rights reserved

Agenda vswitches & Portgroups Nic Teaming Link Aggregation (802.3ad static mode) Failover Configuration Spanning Tree Protocol Network I/O Control Load-Based Teaming VmDirectpath, Vmxnet3, FCOE CNA & 10GB VLAN Trunking (802.1q) Tips & Tricks Troubleshooting Tips Must Read & KB Links 57

Designing the Network How do you design the virtual network for performance and availability and but maintain isolation between the various traffic types (e.g. VM traffic, VMotion, and Management)? Starting point depends on: Number of available physical ports on server Required traffic types 2 NIC minimum for availability, 4+ NICs per server preferred 802.1Q VLAN trunking highly recommended for logical scaling (particularly with low NIC port servers) Examples are meant as guidance and do not represent strict requirements in terms of design Understand your requirements and resultant traffic types and design accordingly 58

ESX Virtual Switch: Capabilities Layer 2 switch forwards frames based on 48-bit destination MAC address in frame VM0 VM1 vswitch MAC address assigned to vnic MAC address known by registration (it knows its VMs!) no MAC learning required Can terminate VLAN trunks (VST mode) or pass trunk through to VM (VGT mode) Physical NICs associated with Switches NIC teaming (of uplinks) Availability: uplink to multiple physical switches Load sharing: spread load over uplinks 59

ESX Virtual Switch: Forwarding Rules The vswitch will forward frames VM VM VM0 VM1 VM Uplink MAC a MAC b MAC c But not forward vswitch vswitch vswitch to vswitch Uplink to Uplink ESX vswitch will not create loops in the physical network Physical Switches And will not affect Spanning Tree (STP) in the physical network 60

Port Group Configuration A Port Group is a template for one or more ports with a common configuration Assigns VLAN to port group members L2 Security select reject to see only frames for VM MAC addr Promiscuous mode/mac address change/forged transmits Traffic Shaping limit egress traffic from VM Load Balancing Origin VPID, Src MAC, IP-Hash, Explicit Failover Policy Link Status & Beacon Probing Notify Switches yes -gratuitously tell switches of mac location Failback yes if no fear of blackholing traffic, or, use Failover Order in Active Adapters Distributed Virtual Port Group (vnetwork Distributed Switch) All above plus: Bidirectional traffic shaping (ingress and egress) Network VMotion network port state migrated upon VMotion 61

NIC Teaming for Load Sharing & Availability VM0 VM1 vswitch NIC Teaming aggregates multiple physical uplinks for: Availability reduce exposure to single points of failure (NIC, uplink, physical switch) Load Sharing distribute load over multiple uplinks (according to selected NIC teaming algorithm) NIC Team Requirements: Two or more NICs on same vswitch Teamed NICs on same L2 broadcast domain KB - NIC teaming in ESX Server (1004088) KB - Dedicating specific NICs to portgroups while maintaining NIC teaming and failover for the vswitch (1002722) 62

NIC Teaming with vds Teaming Policies Are Applied in DV Port Groups to dvuplinks KB - vnetwork Distributed Switch on ESX 4.x - Concepts Overview (1010555) vds A esx10a.tml.local esx09a.tml.local esx09b.tml.local B Service Console vmkernel vmnic0 esx09a.tml.local vmnic0 esx09b.tml.local vmnic0 esx10a.tml.local 3 vmnic2 esx10b.tml.local vmnic1 vmnic3 vmnic2 vmnic0 vmnic1 esx09a.tml.local vmnic1 esx09b.tml.local vmnic1 esx10a.tml.local 2 vmnic0 esx10b.tml.local vmnic2 esx09a.tml.local vmnic2 esx09b.tml.local vmnic2 esx10a.tml.local 1 vmnic3 esx10b.tml.local Orange DV Port Group Teaming Policy A esx10b.tml.local B Service Console vmkernel vmnic3 esx09a.tml.local vmnic3 esx09b.tml.local vmnic3 esx10a.tml.local 0 vmnic1 esx10b.tml.local vmnic0 vmnic1 vmnic2 vmnic3 63

NIC Teaming Options Name Originating Virtual Port ID Source MAC Address Algorithm vmnic chosen based upon: vnic port MAC seen on vnic Physical Network Considerations Teamed ports in same L2 domain (BP: team over two physical switches) Teamed ports in same L2 domain (BP: team over two physical switches) IP Hash* Hash(SrcIP, DstIP) Teamed ports configured in static 802.3ad Etherchannel - no LACP - Needs MEC to span 2 switches Explicit Failover Order Highest order uplink from active list Teamed ports in same L2 domain (BP: team over two physical switches) Best Practice: Use Originating Virtual PortID for VMs *KB - ESX Server host requirements for link aggregation (1001938) *KB - Sample configuration of EtherChannel/Link aggregation with ESX and Cisco/HP switches (1004048) 64

Link Aggregation 65

Link Aggregation - Continued EtherChannel is a port trunking (link aggregation is Cisco's term) technology used primarily on Cisco switches Can be created from between two and eight active Fast Ethernet, Gigabit Ethernet, or 10 Gigabit Ethernet ports LACP or IEEE 802.3ad Link Aggregation Control Protocol (LACP) is included in IEEE specification as a method to control the bundling of several physical ports together to form a single logical channel Only supported on Nexus 1000v EtherChannel vs. 802.3ad EtherChannel and IEEE 802.3ad standards are very similar and accomplish the same goal There are a few differences between the two, other than EtherChannel is Cisco proprietary and 802.3ad is an open standard EtherChannel Best Practice One IP to one IP connections over multiple NICs are not supported (Host A one connection session to Host B uses only one NIC) Supported Cisco configuration: EtherChannel Mode ON ( Enable Etherchannel only) Supported HP configuration: Trunk Mode Supported switch Aggregation algorithm: IP-SRC-DST short for (IP-Source-Destination) Global Policy on Switch The only load balancing option for vswitch or vdistributed Switch that can be used with EtherChannel is IP HASH Do not use beacon probing with IP HASH load balancing Do not configure standby uplinks with IP HASH load balancing. 66

Failover Configurations Link Status Only relies solely on the link status provided by the network adapter Detects failures such as cable pulls and physical switch power failures Cannot detect configuration errors Switch port being blocked by spanning tree Switch port configured for the wrong VLAN cable pulls on the other side of a physical switch. Beacon Probing sends out and listens for beacon probes Ethernet broadcast frames sent by physical adapters to detect upstream network connection failures on all physical Ethernet adapters in the team, as shown in Figure Detects many of the failures mentioned above that are not detected by link status alone Should not be used as a substitute for a redundant Layer 2 network design Most useful to detect failures in the closest switch to the ESX Server hosts Beacon Probing Best Practice Use at least 3 NICs for triangulation If only 2 NICs in team, probe can t determine which link failed Shotgun mode results KB - What is beacon probing? (1005577) KB - ESX host network flapping error when Beacon Probing is selected (1012819) KB - Duplicated Packets Occur when Beacon Probing Is Selected Using vmnic and VLAN Type 4095 (1004373) KB - Packets are duplicated when you configure a portgroup or a vswitch to use a route that is based on IP-hash and Beaconing Probing policies simultaneously (1017612) Figure Using beacons to detect upstream network connection failures. 67

Spanning Tree Protocol (STP) Considerations Physical Switches MAC a VM0 vswitch VM1 MAC b Blocked link vswitch drops BPDUs Switches sending BPDUs every 2s to construct and maintain Spanning Tree Topology Spanning Tree Protocol used to create loop-free L2 tree topologies in the physical network Some physical links put in blocking state to construct loop-free tree ESX vswitch does not participate in Spanning Tree and will not create loops with uplinks ESX Uplinks will not block and always active (full use of all links) Recommendations for Physical Network Config: 1. Leave Spanning Tree enabled on physical network and ESX facing ports (i.e. leave it as is!) 2. Use portfast or portfast trunk on ESX facing ports (puts ports in forwarding state immediately) 3. Use bpduguard to enforce STP boundary KB - STP may cause temporary loss of network connectivity when a failover or failback event occurs (1003804) 68

ESX 4.1 Introduces Network I/O Control VMware vsphere 4.1 ( vsphere ) introduces a number of enhancements and new features to virtual networking. Network I/O Control (NetIOC) flexibly partition and assure service for ESX/ESXi traffic types and flows on a vnetwork Distributed Switch (vds) Load-Based Teaming (LBT) an additional and selectable load-balancing policy on the vds to enable dynamic adjustment of the load distribution over a team of NICs Network performance vmkernel TCP/IP stack and guest virtual-machine network performance enhancements Scale enhancements to network scaling with the vds IPv6 NIST Compliance IPv6 enhancements to comply with U.S. National Institute of Standards and Technology (NIST) Host Profile Cisco Nexus 1000V Enhancements support for new features and enhancements on the Cisco Nexus 1000V 69

Network I/O Control Usage 70

Load-Based Teaming (LBT) LBT is another traffic-management feature of the vds introduced with vsphere 4.1. LBT avoids network congestion on the ESX/ESXi host uplinks caused by imbalances in the mapping of traffic to those uplinks. LBT enables customers to optimally use and balance network load over the available physical uplinks attached to each ESX/ESXi host. LBT helps avoid situations where one link may be congested, while other links may be relatively underused. How LBT works 71 LBT dynamically adjusts the mapping of virtual ports to physical NICs to best balance the network load entering or leaving the ESX/ESXi 4.1 host. When LBT detects an ingress- or egress- congestion condition on an uplink, signified by a mean utilization of 75% or more over a 30-second period, it will attempt to move one or more of the virtual ports to vmnic-mapped flows to lesser-used links within the team. Configuring LBT LBT is an additional load-balancing policy available within the teaming and failover of a dvportgroup on a vds. LBT appears as the Route based on physical NIC load. *LBT is not available on the vnetwork Standard Switch (vss).

VMXNET3 The Para-virtualized VM Virtual NIC Next evolution of Enhanced VMXNET introduced in ESX 3.5 Adds MSI/MSI-X support (subject to guest operating system kernel support) Receive Side Scaling (supported in Windows 2008 when explicitly enabled through the device's Advanced configuration tab) Large TX/RX ring sizes (configured from within the virtual machine) High performance emulation mode (Default) Supports High DMA TSO (TCP Segmentation Offload) over IPv4 and IPv6 TCP/UDP checksum offload over IPv4 and IPv6 Jumbo Frames 802.1Q tag insertion KB - Choosing a network adapter for your virtual machine (1001805) 72

VMDirectPath for VMs I/O Device Device Driver Virtual Layer What is it? Enables direct assignment of PCI devices to VM Types of workloads I/O Appliances High performance VMs Details Guest controls the physical H/W Requirements vsphere 4 I/O MMU Used for DMA Address Translation (Guest Physical Host Physical) and protection Generic device reset (FLR, Link Reset,...) KB - Configuring VMDirectPath I/O pass-through devices on an ESX host (1010789) 73

FCoE on ESX VMware ESX Support FCoE supported since ESX 3.5u2 vswitch ESX Requires Converged Network Adapters CNAs (see HCL) e.g. Emulex LP21000 Series 10GigE NIC FCoE Fibre Channel HBA CNA Converged Network Adapter Qlogic QLE8000 Series Appears to ESX as: 10GigE NIC FC HBA FCoE Switch Ethernet Fibre Channel SFP+ pluggable transceivers Copper twin-ax (<10m) Optical 74

Using 10GigE Variable/high b/w 2Gbps+ High 1-2G b/w Low b/w Ingress (into switch) traffic shaping policy control on Port Group iscsi NFS VMotion FT SC SC#2 2x 10GigE common/expected 10GigE CNAs or NICs Possible Deployment Method FCoE 10GE 10GE vswitch FCoE Gbps 10 Active/Standby on all Portgroups VMs sticky to one vmnic SC/vmk ports sticky to other FCoE Use Ingress Traffic Shaping to control traffic type per FCoE Priority Group bandwidth reservation (in CNA config utility) Port Group If FCoE, use Priority Group bandwidth reservation (on CNA utility) 75

Traffic Types on a Virtual Network Virtual Machine Traffic Traffic sourced and received from virtual machine(s) Isolate from each other based on service level VMotion Traffic Traffic sent when moving a virtual machine from one ESX host to another Should be isolated Management Traffic Should be isolated from VM traffic (one or two Service Consoles) If VMware HA is enabled, includes heartbeats IP Storage Traffic NFS and/or iscsi via vmkernel interface Should be isolated from other traffic types Fault Tolerance (FT) Logging Traffic Low latency, high bandwidth Should be isolated from other traffic types How do we maintain traffic isolation without proliferating NICs? 76

VLAN Trunking to Server VM0 VM1 IEEE 802.1Q VLAN Tagging Enables logical network partitioning (Traffic separation) Port Group Yellow VLAN 10 vswitch PortGroup Blue VLAN 20 VLAN Trunks Carrying VLANs 10, 20 Scale traffic types without scaling physical NICs Virtual machines connect to virtual switch ports (like access ports on physical switch) Virtual switch ports are associated with a particular VLAN (VST mode) defined in PortGroup Virtual switch tags packets exiting host 8100 802.1Q Header 12-bit VLAN id field (0-4095) 77

VLAN Tagging Options VST Virtual Switch Tagging VGT Virtual Guest Tagging EST External Switch Tagging VLAN assigned in Port Group policy vswitch vswitch vswitch VLAN Tags applied in vswitch VLAN Tags applied in Guest PortGroup set to VLAN 4095 Physical Switch Physical Switch Physical Switch VST is the best practice and most common method External Physical switch applies VLAN tags 78

VLAN Tagging: Further Example Access Ports on VLAN 10 Access Ports on VLAN 20 Access Ports on VLAN 50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 interface GigabitEthernet1/2 description host32-vmnic0 switchport trunk encapsulation dot1q switchport trunk native vlan 999 switchport trunk allowed vlan 10,20,50,90 switchport mode trunk spanning-tree portfast trunk A B C Example configuration on Physical Switch VLAN Trunks Carrying VLANs 10, 20, 50, 90 All VLANs (10,20,50,90) trunked to VM KB -Sample configuration of virtual switch VLAN tagging (VST Mode) and ESX Server (1004074) Uplinks A, B, and C connected to trunk ports on physical switch which carry four VLANs (e.g. VLANs 10, 20, 50, 90) Ports 1-14 emit untagged frames, and only those frames which were tagged with their respective VLAN ID (equivalent to access port on physical switch) Port Group VLAN ID set to one of 1-4094 Port 15 emits tagged frames for all VLANs. Port Group VLAN ID set to 4095 (for vss) or VLAN Trunking on vds DV Port Group 79

Private VLANs: Traffic Isolation for Every VM Solution: PVLAN Place VMs on the same virtual network but prevent them from communicating directly with each other (saves VLANs!) Private VLAN traffic isolation between guest VMs Avoids scaling issues from assigning one VLAN and IP subnet per VM Details Instead, configure a SINGLE DV port group to have a SINGLE isolated* VLAN (ONLY ONE) Attach all your VMs to this SINGLE isolated VLAN DV port group Distributed Switch with PVLAN Common Primary VLAN on uplinks KB - Private VLAN (PVLAN) on vnetwork Distributed Switch - Concept Overview (1010691) 80

Private VLANs - Continued W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B PG PG PG PG PG PG PG PG PG PG PG PG vnetwork Distributed Switch TOTAL COST: 12 VLANs (one per VM) W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B PG (with Isolated PVLAN) vnetwork Distributed Switch TOTAL COST: 1 PVLAN (over 90% savings ) 81

Tips & Tricks KB - Changing a MAC address in a Windows virtual machine (1008473) When a physical machine is converted into a virtual machine, the MAC address of the network adapter is changed. This can pose a problem when software is installed where the licensing is tied to the MAC address. KB Configuring speed and duplex of an ESX Server host network adapter (1004089) ESX recommended settings for Gigabit-Ethernet speed and duplex while connecting to a physical switch port are as following: Auto Negotiate <-> Auto Negotiate It is not recommended to mix hard-coded setting with Auto-negotiate. KB - Sample Configuration - Network Load Balancing (NLB) Multicast mode over routed subnet - Cisco Switch Static ARP Configuration (1006525) NLB Multicast Mode Static ARP Resolution Since NLB packets are unconventional, meaning the IP address is Unicast while the MAC address of it is Multicast, switches and routers drop NLB packets NLB Multicast Packets get dropped by routers and switches, causing the ARP tables of switches to not get populated with cluster IP and MAC address Manual ARP Resolution of NLB cluster address is required on physical switch and router interfaces Cluster IP and MAC static resolution is set on each switch port that connects to ESX host 82

Troubleshooting Tips 83

Troubleshooting with Esxtop 84

Esxtop Traffic 85

Capturing Traffic 86

ESX tcpdump 87

Wireshark in a VM 88

Must Read http://www.vmware.com/technical-resources/virtual-networking/ Technical Papers Conclusion VMXNET3, the newest generation of virtual network adapter from VMware, offers performance on par with or better than its previous generations in both Windows and Linux guests. Both the driver and the device have been highly tuned to perform better on modern systems. Furthermore, VMXNET3 introduces new features and enhancements, such as TSO6 and RSS. TSO6 makes it especially useful for users deploying applications that deal with IPv6 traffic, while RSS is helpful for deployments requiring high scalability. All these features give VMXNET3 advantages that are not possible with previous generations of virtual network adapters. Moving forward, to keep pace with an ever increasing demand for network bandwidth, we recommend customers migrate to VMXNET3 if performance is of top concern to their deployments. Conclusion This study compares performance results for e1000 and vmxnet virtual network devices on 32-bit and 64-bit guest operating systems using the netperf benchmark. The results show that when a virtual machine is running with software virtualization, e1000 is better in some cases and vmxnet is better in others. Vmxnet has lower latency, which sometimes comes at the cost of higher CPU utilization. When hardware virtualization is used, vmxnet clearly provides the best performance. 89

KB Links KB - Cisco Discovery Protocol (CDP) network information via command line and VirtualCenter on an ESX host (1007069) Utilizing Cisco Discovery protocol (CDP) to get switch port configuration information. This command is utilized to troubleshoot network connectivity issues related to VLAN tagging methods on virtual and physical port settings. KB - Troubleshooting network issues with the Cisco show tech-support command (1015437) If you experience networking issues between vswitch and physical switched environment, you can obtain information about the configuration of a Cisco router or switch by running the show tech-support command in privileged EXEC mode. Note: This command does not alter the configuration of the router. KB - ESX host or virtual machines have intermittent or no network connectivity (1004109) KB - Troubleshooting Nexus 1000V vds network issues (1014977) KB - Cisco Nexus 1000V installation and licensing information (1013452) Cisco Nexus 1000V Troubleshooting Guide, Release 4.0(4)SV1(2) 20/Jan/2010 Cisco Nexus 1000V Troubleshooting Guide, Release 4.0(4)SV(1) 21/Jan/2010 KB - Configuring promiscuous mode on a virtual switch or portgroup (1004099) KB - Troubleshooting network issues by capturing and sniffing network traffic via tcpdump (1004090) 90

KB Links - Continued KB - Troubleshooting network connection issues using Address Resolution Protocol (ARP) (1008184) IEEE OUI and Company id Assignments http://standards.ieee.org/regauth/oui/index.shtml KB - Network performance issues (1004087) KB - Low Network Throughput in Windows Guest when Running UDP Application (5298153) KB - Performance of Outgoing UDP Packets Is Poor (10172) KB - Poor Network File Copy performance between local VMFS and shared VMFS (1003554) KB - Cannot connect to ESX 4.0 host for 30-40 minutes after boot (1012942) Ensure that DNS is configured and reachable from the ESX host KB - Identifying issues with and setting up name resolution on ESX Server (1003735) Note: localhost must always be present in the hosts file. Do not modify or remove the entry for localhost The hosts file must be identical on all ESX Servers in the cluster There must be an entry for every ESX Server in the cluster Every host must have an IP address, Fully Qualified Domain Name (FQDN), and short name The hosts file is case sensitive. Be sure to use lowercase throughout the environment 91

Questions 2009 VMware Inc. All rights reserved

ESXi Readiness Planning your migration to VMware ESXi, the next-generation hypervisor architecture. David Garcia Jr - Global Support Services 2009 VMware Inc. All rights reserved

The Gartner Group says The major benefit of ESXi is the fact that it is more lightweight under 100MB versus 2GB for VMware ESX with the service console. Smaller means fewer patches It also eliminates the need to manage a separate Linux console (and the Linux skills needed to manage it) As of August 2010 VMware users should put a plan in place to migrate to ESXi during the next 12 to 18 months. 94

VMware ESXi and ESX hypervisor architectures comparison VMware ESX Hypervisor Architecture VMware ESXi Hypervisor Architecture Code base disk footprint: ~ 2GB VMware agents run in Console OS Nearly all other management functionality provided by agents running in the Console OS Users must log into Console OS in order to run commands for configuration and diagnostics Code base disk footprint: <100 MB VMware agents ported to run directly on VMkernel Authorized 3rd party modules can also run in VMkernel to provide hw monitoring and drivers Other capabilities necessary for integration into an enterprise datacenter are provided natively No other arbitrary code is allowed on the system 95

Call to action for customers Start testing ESXi If you ve not already deployed, there s no better time than the present Ensure your 3 rd party solutions are ESXi Ready Monitoring, backup, management, etc. Most already are. Bid farewell to agents! Familiarize yourself with ESXi remote management options Transition any scripts or automation that depended on the COS Powerful off-host scripting and automation using vcli, PowerCLI, Plan an ESXi migration as part of your vsphere upgrade Testing of ESXi architecture can be incorporated into overall vsphere testing 96

Visit the ESXi and ESX Info Center today http://vmware.com/go/esxiinfocenter 97

Questions 2009 VMware Inc. All rights reserved

Break 2009 VMware Inc. All rights reserved

vsphere 4 - Performance Best Practices Kenneth Kemp, Escalation Engineer 2009 VMware Inc. All rights reserved

Agenda Technical Guides ESX 4.x Performance & Troubleshooting Memory CPU vcenter Performance & Troubleshooting High Availability Distributed Resource Scheduler Fault Tolerance Resource Pool Designs HW Considerations and Settings 101

Technical Guides 102

Memory 2009 VMware Inc. All rights reserved

Memory Resource Types When assigning a VM a physical amount of RAM, all you are really doing is telling ESX how much memory a given VM process will maximally consume past the overhead. Whether or not that memory is physical depends on a few factors: Host configuration, DRS shares/limits/reservations and host load. Generally speaking, it is better to OVER-commit than UNDER-commit. 104

Memory Overhead & Reclamation ESX memory space overhead Service Console: 272 MB VMkernel: 100 MB+ Per-VM memory space overhead increases with: Number of VCPUs Size of guest memory 32 or 64 bit guest OS ESX memory space reclamation Page sharing Ballooning 105

Memory Page Tables Page tables ESX cannot use guest page tables ESX Server maintains shadow page tables Translate memory addresses from virtual to machine Per process, per VCPU VMM maintains physical (per VM) to machine maps No overhead from ordinary memory references Overhead Page table initialization and updates Guest OS context switching VA PA MA 106

Memory Over-commitment & Sizing Avoid high active host memory over-commitment Total memory demand = active working sets of all VMs + memory overhead page sharing No ESX swapping: total memory demand < physical memory Right-size guest memory Define adequate guest memory to avoid guest swapping Per-VM memory space overhead grows with guest memory 107

Memory NUMA considerations Increasing a VM s memory on a NUMA machine Will eventually force some memory to be allocated from a remote node, which will decrease performance Try to size the VM so both CPU and memory fit on one node Node 0 Node 1 108

Memory NUMA considerations continued NUMA scheduling and memory placement policies in ESX manages all VMs transparently No need to manually balance virtual machines between nodes NUMA optimizations available when node interleaving is disabled Manual override controls available Memory placement: 'use memory from nodes' Processor utilization: 'run on processors' Not generally recommended For best performance of VMs on NUMA systems # of VCPUs + 1 <= # of cores per node VM memory <= memory of one node 109

Memory Balancing & Overcommitment ESX must balance memory usage for all worlds Virtual machines, Service Console, and vmkernel consume memory Page sharing to reduce memory footprint of Virtual Machines Ballooning to relieve memory pressure in a graceful way Host swapping to relieve memory pressure when ballooning insufficient ESX allows overcommitment of memory Sum of configured memory sizes of virtual machines can be greater than physical memory if working sets fit 110

Memory - Ballooning Ballooning: Memctl driver grabs pages and gives to ESX Guest OS choose pages to give to memctl (avoids hot pages if possible): either free pages or pages to swap Unused pages are given directly to memctl Pages to be swapped are first written to swap partition within guest OS and then given to memctl VM1 F memctl Swap partition w/in 1. Balloon Guest OS 2. Reclaim ESX 3. Redistribute VM2 111

Memory - Swapping Swapping: ESX reclaims pages forcibly Guest doesn t pick pages ESX may inadvertently pick hot pages ( possible VM performance implications) Pages written to VM swap file VM1 VM2 Swap VSWP Partitio (external to guest) n (w/in 112 guest) ESX 1. Force Swap 2. Reclaim 3. Redistribute

Memory Ballooning vs. Swapping Bottom line: Ballooning may occur even when no memory pressure just to keep memory proportions under control Ballooning is vastly preferably to swapping Guest can surrender unused/free pages With host swapping, ESX cannot tell which pages are unused or free and may accidentally pick hot pages Even if balloon driver has to swap to satisfy the balloon request, guest chooses what to swap Can avoid swapping hot pages within guest 113

Memory Ok, So Why Do I Care About Memory Usage? If running VMs consume too much host memory Some VMs do not get enough host memory This forces either ballooning or host swapping to satisfy VM demands Host swapping or excessive ballooning reduced VM performance If I do not size a VM properly (e.g., create Windows VM with 128MB RAM) Within the VM, swapping occurs, resulting in disk traffic VM may slow down But don t make memory too big! (High overhead memory) 114

Memory - Important Memory Metrics (Per VM) Metric (Client) Swap in rate (ESX4.0 Hosts) Metric (esxtop) Metric (SDK) Description SWR/s mem.swapinrate.average Rate at which mem is swapped in from disk Swap out rate (ESX4.0 Hosts) SWW/s mem.swapoutrate.average Rate at which mem is swapped out to disk Swapped SWCUR mem.swapped.average (level 2 counter) Swap in (cumulative) Swap out (cumulative) ~swap out swap in n/a mem.swapin.average Mem swapped in from disk n/a mem.swapout.average Mem swapped out to disk One rule of thumb: > 1MB/s swap in or swap out rate may mean memory overcommitment 115

Memory - Important Memory Metrics (Per Host, sum of VMs) Metric (Client) Swap in rate (ESX4.0 Hosts) Metric (esxtop) Metric (SDK) Description SWR/s mem.swapinrate.average Rate at which mem is swapped in from disk Swap out rate (ESX4.0 Hosts) SWW/s mem.swapoutrate.average Rate at which mem is swapped out to disk Swap used SWCUR mem.swapused.average (level 2 counter) Swap in (cumulative) Swap out (cumulative) ~swap out swap in n/a mem.swapin.average Mem swapped in from disk n/a mem.swapout.average Mem swapped out to disk One rule of thumb: > 1MB/s swap in or swap out rate may mean memory overcommitment 116

Memory - vsphere Client: Swapping on a Host Increased swap activity may be a sign of over-commitment No swapping Lots of swapping 117

Memory - A Stacked Chart (per VM) of Swapping No swappin g Lots of swappin g 118

Memory - Counters Shown in vsphere Client: Host Overview Page Balloon Active Swap used Granted Shared common 119

Memory - Counters Shown in vsphere Client: VM Overview Page Balloon target (how much should be ballooned) Swapped (~swap out swap in) Shared Balloon Active 120

Memory - Other Counters Shown in vsphere Client Main page shows host memory usage (consumed + overhead memory + Service Console) Data refreshed at 20s intervals 121

Memory - Counters Shown on VM List Summary Tab Host CPU: Avg. CPU utilization for Virtual machine Host Memory: consumed + overhead memory for Virtual Machine Guest Memory: active memory for guest Note: This page is updated once per minute 122

Memory - Breakdown in a VM Host Overhead consumed Private (non-shared) Shared (content-based page-sharing) Guest 123 Active used as input to DRS Overhead reserved Unaccessed = unmapped (~never been touched)

Memory - Virtual Machine Memory Metrics, vsphere Client Metric Memory Active (KB) Memory Usage (%) Memory Consumed (KB) Memory Granted (KB) Memory Shared (KB) Memory Balloon (KB) Memory Swapped (KB) (ESX4.0: swap rates!) Overhead Memory (KB) Description Physical pages touched recently by a virtual machine Active memory / configured memory Machine memory mapped to a virtual machine, including its portion of shared pages. Does NOT include overhead memory. VM physical pages backed by machine memory. May be less than configured memory. Includes shared pages. Does NOT include overhead memory. Physical pages shared with other virtual machines Physical memory ballooned from a virtual machine Physical memory in swap file (approx. swap out swap in ). Swap out and Swap in are cumulative. Machine pages used for virtualization 124

Memory - Host Memory Metrics, vsphere Client Metric Memory Active (KB) Memory Usage (%) Memory Consumed (KB) Memory Granted (KB) Memory Shared (KB) Shared common (KB) Memory Balloon (KB) Memory Swap Used (KB) (ESX4.0: swap rates!) Overhead Memory (KB) Description Physical pages touched recently by the host Active memory / configured memory Total host physical memory free memory on host. Includes Overhead and Service Console memory. Sum of memory granted to all running virtual machines. Does NOT include overhead memory. Sum of memory shared for all running VMs Total machine pages used by shared pages Machine pages ballooned from virtual machines Physical memory in swap files (approx. swap out swap in ). Swap out and Swap in are cumulative. Machine pages used for virtualization 125

Memory - Troubleshooting Memory Problems with Esxtop Swapping Memory Hog VMs MCTL: N - Balloon driver not active, tools probably not installed Ballooning active Swapped in the past but not actively swapping now More swapping since balloon driver is not active 126

CPU 2009 VMware Inc. All rights reserved

CPU - Resource Types CPU resources are the raw processing speed of a given host or VM However, on a more abstract level, we are also bound by the hosts ability to schedule those resources. We also have to account for running a VM in the most optimal fashion, which typically means running it on the same processor that the last cycle completed on. 128

CPU SMP Performance Some multi-threaded apps in a SMP VM may not perform well Use multiple UP VMs on a multi-cpu physical machine ESX Server ESX Server 129

CPU - Performance Overhead & Utilization CPU virtualization adds varying amounts of overhead Little or no overhead for the part of the workload that can run in direct execution Small to significant overhead for virtualising sensitive privileged instructions Performance reduction vs. increase in CPU utilization CPU-bound applications: any CPU virtualization overhead results in reduced throughput non-cpu-bound applications: should expect similar throughput at higher CPU utilization 130

CPU VM vcpu Processor Support ESX supports up to eight virtual processors per VM Use UP VMs for single-threaded applications Use UP HAL or UP kernel For SMP VMs, configure only as many VCPUs as needed Unused VCPUs in SMP VMs: Impose unnecessary scheduling constraints on ESX Server Waste system resources (idle looping, process migrations, etc.) 131

CPU 64-bit Performance Full support for 64-bit guests 64-bit can offer better performance than 32-bit More registers, large kernel tables, no HIGHMEM issue in Linux ESX Server may experience performance problems due to shared host interrupt lines Can happen with any controller; most often with USB Disable unused controllers Physically move controllers See KB 1290 for more details 132

CPU Virtual Machine Worlds ESX is designed to run Virtual Machines Schedulable entity = world Virtual Machines are composed of worlds Service Console is a world (has agents like vpxa, hostd) Helper Worlds ESX uses proportional-share scheduler to help with resource management Limits Shares Reservations Balanced interrupt processing 133

CPU ESX CPU Scheduling World states (simplified view): ready = ready-to-run but no physical CPU free run = currently active and running wait = blocked on I/O Multi-CPU Virtual Machines => variant of gang scheduling called relaxed co-scheduling Co-run (latency to get vcpus running) Co-stop (time in stopped state) 134

CPU - So, How Do I Spot CPU Performance Problems? One common issue is high CPU ready time High ready time possible contention for CPU resources among VMs Many possible reasons CPU overcommitment (high %rdy + high %used) Workload variability set on VM No fixed threshold, but > 20% for a VCPU Investigate further 135

CPU: Useful Metrics Per-HOST Metric (Client) Metric (esxtop) Metric (sdk) Description Usage (%) %USED cpu.usage.average CPU used over the collection interval (%) Usage (MHz) n/a cpu.usagemhz.average CPU used over the collection interval (MHz) 136

CPU: Useful Metrics Per-VM Per-VM Metric (Client) Metric (esxtop) Metric (SDK) Description Usage (%) %USED cpu.usage.average CPU used over the collection interval Used (ms) %USED cpu.used.summation CPU used over the collection interval)* Ready (ms) %RDY cpu.ready.summation CPU time spent in ready state* Swap wait time (ms) [ESX4.0 hosts] %SWPWT cpu.swapwait.summation CPU time spent waiting for hostlevel swap-in nits different between esxtop and vsphere client 137

CPU - vsphere Client CPU Screenshot Hint PU milliseconds and percent are on the same chart but use differen 138

CPU - Spotting CPU Overcommitment in esxtop 2-CPU box, but 3 active VMs (high %used) High %rdy + high %used can imply CPU overcommitment 139

CPU - Spotting Workload Variability in the vsphere Client Used time ~ ready time: may signal contention. However, might not be overcommitted due to workload variability In this example, we have periods of activity and idle periods: CPU isn t overcommitted all the time Ready time < used time Used time Ready time ~ used time 140

CPU - High Ready Time Due to Limits Set on VM: esxtop High Ready Time High MLMTD: there is a limit on this VM High ready time not always because of overcommitment 141

CPU - High Ready Time Due to Limits: vsphere Client High ready time Limit on CPU 142

CPU - Ready Time: Why There is no Fixed Threshold Ready time jump from 12.5% (idle DB) to 20% (busy DB) didn t notice until responsiveness suffered! 143

CPU - Summary of Possible Reasons for High Ready Time CPU overcommitment Possible solution: add more CPUs or VMotion the VM Workload variability A bunch of VMs wake up all at once Note: system may be mostly idle: not always overcommitted Limit set on VM 4x2GHz host, 2 vcpu VM, limit set to 1GHz (VM can consume 1GHz) Without limit, max is 2GHz. With limit, max is 1GHz (50% of 2GHz) CPU all busy: %USED: 50%; %MLMTD & %RDY = 150% [total is 200%, or 2 CPUs] 144

vcenter 2009 VMware Inc. All rights reserved

vcenter - Best Practices VC Database sizing Estimate of the space required to store your performance statistics in the DB Separate Critical Files onto Separate Drives Make sure the database and transaction log files are placed on separate physical drives Place the tempdb database on a separate physical drive if possible Arrangement distributes the I/O to the DB and dramatically improves its performance If a third drive is not feasible, place the tempdb files on the transaction log drive Enable Automatic Statistics Keep vcenter logging level low, unless troubleshooting Proper scheduling of DB backups, maintenance, monitoring Do not run vcenter on a server that has many applications running vcenter Heartbeat - http://www.vmware.com/products/vcenter-serverheartbeat/ 146

vcenter - Performance High CPU utilization and sluggish UI performance Number of clients attached is high VC needs to keep clients consistent with inventory changes Aggressive alarm settings DB administration Periodic maintenance Recovery and log settings Appropriate VC statistics level Use gigabit NICs for the service console to clone VMs Assign permissions appropriately SQL Server Express will only run well up to 5 hosts and/or 50 VMs. Past that, VC needs to run off an Enterprise-class DB. 147

vcenter - High Availability (HA) HA network configuration check DNS, NTP, lowercase hostnames, HA advanced settings Redundancy: server hardware, shared storage, network, management Test network isolation from a core switch level, and host failure for expected outage behavior Critical VMs should NOT be grouped together Categorize VM criticality, then set the failover appropriately Valid VM network label names required for proper failover Failover capacity/admission control may be too conservative when host and VM sizes vary widely slot size calculator in VC 148

vcenter - DRS (Distributed Resource Scheduler) Higher number of hosts => more DRS balancing options Recommend up to 32 hosts/cluster, may vary with VC server configuration and VM/host ratio Network configuration on all hosts - VMotion network: Security policies, VMotion NIC enabled, Gig Reservations, Limits, and Shares - Shares take effect during resource contention - Low limits can lead to wasted resources - High VM reservations may limit DRS balancing - Overhead memory - Use resource pools for better manageability, do not nest too deep Virtual CPU s and Memory size High memory size and virtual CPU s => fewer migration opportunities Configure VMs based on need network, etc. 149

vcenter - DRS (Cont.) 150 Ensure hosts are CPU compatible - Intel vs. AMD - Similar CPU family/features - Consistent server bios levels, and NX bit exposure - Enhanced VMotion Compatibility (EVC) - VMware VMotion and CPU Compatibility whitepaper - CPU incompatibility => limited DRS VM migration options Larger Host CPU and memory size preferred for VM placement (if all equal) Differences in cache or memory architecture => inconsistency in performance Aggressiveness threshold - Moderate threshold (default) works well for most cases Aggressive thresholds recommended if homogenous clusters and VM demand relatively constant and few affinity/anti-affinity rules Use affinity/anti-affinity rules only when needed Affinity rules: closely interacting VMs Anti-affinity rules: I/O intensive workloads, availability Automatic DRS mode recommended (cluster-wide) Manual/Partially automatic mode for location-critical VMs (per VM) Per VM setting overrides cluster-wide setting

vcenter Resource Pool Tug of War Design This design is simple and does not limit any VMs from any physical resources. Using the ESX shares mechanism, if two or more VMs are competing for the same physical resources the tug of war that results will be decided by the resource pool memberships of the VMs. The ESX cluster will have three resource pools defined. A High resource pool will have no initial reservation and unlimited/expandable RAM and CPU settings. CPU and Memory shares will be set to high. This resource pool will be devoted for mission-critical VMs. 151 A second Normal resource pool will have no initial reservation and unlimited/expandable RAM and CPU

vcenter Resource Pool Pizza Design This design takes the sum total of all physical resources and slices it up across the resource pools. Although the following design only uses two resource pools, many more slices could be created. The most basic Pizza Design would be to reserve all memory and cpu, but the following example helps also illustrate reservations and limits. The ESX cluster will have two resource pools defined. A Critical Services resource pool will have an initial reservation of 32GB RAM and 8GHz CPU, and unlimited/expandable RAM and CPU settings. This resource pool will be devoted for mission-critical VMs. Shares for RAM will be set to high, but shares for CPU will be set to normal. 152

vcenter - FT - Fault Tolerance FT Provides complete VM redundancy By definition, FT doubles resource requirements Turning on FT disables performance-enhancing features like, H/W MMU Each time FT is enabled, it causes a live migration Use a dedicated NIC for FT traffic Place primaries on different hosts Asynchronous traffic patterns Host Failure considerations Run FT on machines with similar characteristics 153

vcenter - HW Considerations and Settings When purchasing new servers, target MMU virtualization(ept/rvi) processors, or at least CPU virtualization(vt-x/amd-v) depending on your application work loads If your application workload is creating/destroying a lot of processes, or allocating a lot of memory them MMU will help performance Purchase uniform, high-speed, quality memory, populate memory banks evenly in the power of 2. Choosing a system for better i/o performance MSI-X is needed which allows support for multiple queues across multiple processors to process i/o in parallel PCI slot configuration on the motherboard should support PCIe v/2.0 if you intend to use 10 gb cards, otherwise you will not utilize full bandwidth 154

vcenter - HW Considerations and Settings (cont.) BIOS Settings - Make sure what you paid for, is enabled in the bios -enable Turbo-Mode if your processors support it - Verify that hyper-threading is enabled more logical CPUs allow more options for the VMkernel scheduler - NUMA systems verify that node-interleaving is enabled - Be sure to disable power management if you want to maximize performance unless you are using DPM. Need to decide if performance out-weighs power savings C1E halt state - This causes parts of the processor to shut down for a short period of time in order to save energy and reduce thermal loss -Verify VT/NPT/EPT are enabled as older Barcelona systems do not enable these by default -Disable any unused USB, or serial ports 155

Reference Guide Links VMware vcenter Server Performance and Best Practices for vsphere 4.1 http://www.vmware.com/resources/techresources/10145 Performance Best Practices for VMware vsphere 4.0 http://www.vmware.com/pdf/perf_best_practices_vsphere4.0.pdf SAN System Design and Deployment Guide http://www.vmware.com/files/pdf/techpaper/san_design_and_deployment_guide.pdf VMware vsphere: The CPU Scheduler in VMware ESX 4.1 http://www.vmware.com/files/pdf/techpaper/vmw_vsphere41_cpu_schedule_esx.pdf 156

Reference Guide Links Continued Understanding Memory Resource Management in VMware ESX 4.1 http://www.vmware.com/files/pdf/techpaper/vsp_41_perf_memory_mgmt.pdf Managing Performance Variance of Applications Using Storage I/O Control http://www.vmware.com/files/pdf/techpaper/vsp_41_perf_sioc.pdf What s New in VMware vsphere 4.1 Networking http://www.vmware.com/files/pdf/techpaper/vmw-whats-new-vsphere41-networking.pdf VMware Network I/O Control: Architecture, Performance and Best Practices VMware vsphere 4.1 http://www.vmware.com/files/pdf/techpaper/vmw_netioc_bestpractices.pdf Designing Resource Pools http://vmetc.com/2008/03/04/designing-esx-resource-pools/ 157

Questions 2009 VMware Inc. All rights reserved