Capacity Planning for 1000 virtual servers (What happens when the honey moon is over?) (SHARE SESSION 10334)



Similar documents
What s new in zvps 4.2 z/vm and Linux Performance Reporting Barton Robinson Velocity Software, Inc Barton@VelocitySoftware.com

Managing Linux on z/vm using ESALPS

What s new in VM and Linux Performance Reporting with zvps. To download:

how can I optimize performance of my z/vm and Linux on System z environments?

Understanding Linux on z/vm Steal Time

Large Linux Guests. Linux on z/vm Performance. Session 12390

Linux on z/vm Configuration Guidelines

Private Cloud for WebSphere Virtual Enterprise Application Hosting

Customer Experiences With Oracle on Linux on System z

zpcr Capacity Sizing Lab Part 2 Hands-on Lab

Running Oracle Databases in a z Systems Cloud environment

Challenges of Capacity Management in Large Mixed Organizations

CA s Cloud Storage for System z

z/vm Capacity Planning Overview

Performance brief for IBM WebSphere Application Server 7.0 with VMware ESX 4.0 on HP ProLiant DL380 G6 server

Managing z/vm & Linux Performance Best Practices

SUSE Linux Enterprise 10 SP2: Virtualization Technology Support

Technical Paper. Moving SAS Applications from a Physical to a Virtual VMware Environment

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Linux on z/vm Memory Management

Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

Why Relative Share Does Not Work

Hard Partitioning and Virtualization with Oracle Virtual Machine. An approach toward cost saving with Oracle Database licenses

RED HAT ENTERPRISE VIRTUALIZATION FOR SERVERS: COMPETITIVE FEATURES

Hyper-V vs ESX at the datacenter

z/vm Capacity Planning Overview SHARE 117 Orlando Session 09561

Oracle Hyperion Financial Management Virtualization Whitepaper

IBM Enterprise Linux Server

BridgeWays Management Pack for VMware ESX

Virtualization. as a key enabler for Cloud OS vision. Vasily Malanin Datacenter Product Management Lead Microsoft APAC

Building an Internal Cloud that is ready for the external Cloud

The Benefits of POWER7+ and PowerVM over Intel and an x86 Hypervisor

Server and Storage Sizing Guide for Windows 7 TECHNICAL NOTES

Virtual Machines.

Table of Contents. Authors Wendy Wong, Senior Software Architect, and the SOLVE:Operations/Linux Connector team. Print Edition: WW180311

Big Data Storage in the Cloud

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems

Oracle Database Scalability in VMware ESX VMware ESX 3.5

A Study on Reducing Labor Costs Through the Use of WebSphere Cloudburst Appliance

Running VirtualCenter in a Virtual Machine

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

FOR SERVERS 2.2: FEATURE matrix

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Best Practices on monitoring Solaris Global/Local Zones using IBM Tivoli Monitoring

IBM SAP International Competence Center. The Home Depot moves towards continuous availability with IBM System z

VMware vcloud Service Definition for a Private Cloud

Kronos Workforce Central on VMware Virtual Infrastructure

BLACKBOARD LEARN TM AND VIRTUALIZATION Anand Gopinath, Software Performance Engineer, Blackboard Inc. Nakisa Shafiee, Senior Software Performance

APPENDIX 1 SUBSCRIPTION SERVICES

Performance and scalability of a large OLTP workload

z/vm and Linux Disaster Recovery A Customer Experience Lee Stewart Sirius Computer Solutions (DSP)

Mainframe. Large Computing Systems. Supercomputer Systems. Mainframe

FDR/UPSTREAM and INNOVATION s other z/os Solutions for Managing BIG DATA. Protection for Linux on System z.

Session 1: Managing Software Licenses in Virtual Environments. Paul Baguley, Principal, Advisory Services KPMG

Scaling in a Hypervisor Environment

Managing z/vm and Linux Performance Best Practices

APPENDIX 1 SUBSCRIPTION SERVICES

Virtual Machine Monitors. Dr. Marc E. Fiuczynski Research Scholar Princeton University

Top 5 Reasons to choose Microsoft Windows Server 2008 R2 SP1 Hyper-V over VMware vsphere 5

An Oracle White Paper August Oracle VM 3: Server Pool Deployment Planning Considerations for Scalability and Availability

CA MICS Resource Management r12.6

IBM Software Group. Lotus Domino 6.5 Server Enablement

Configuring and Managing a Private Cloud with Enterprise Manager 12c

System and Storage Virtualization For ios (AS/400) Environment

Linux on IBM Eserver zseries and S/390:

Cloud Computing with xcat on z/vm 6.3

Sheet1. Red Hat Enterprise Linux Server (Disaster Recovery), Standard (1-2 sockets) (Unlimited guests) with Smart Management

Siemens SAP Cloud - Infrastructure as a Service

IOS110. Virtualization 5/27/2014 1

CON9577 Performance Optimizations for Cloud Infrastructure as a Service

Managing Capacity Using VMware vcenter CapacityIQ TECHNICAL WHITE PAPER

Configuration Maximums VMware Infrastructure 3

Esri ArcGIS Server 10 for VMware Infrastructure

System z Batch Network Analyzer Tool (zbna) - Because Batch is Back!

CA SYSVIEW Performance Management r13.0

CA Virtual Assurance/ Systems Performance for IM r12 DACHSUG 2011

VMware vsphere 5.0 Boot Camp

Virtualization Performance on SGI UV 2000 using Red Hat Enterprise Linux 6.3 KVM

Server Migration from UNIX/RISC to Red Hat Enterprise Linux on Intel Xeon Processors:

The Journey to Cloud Computing: from experimentation to business reality

Transcription:

Other products and company names mentioned herein may be trademarks of their respective Capacity Planning for 1000 virtual servers (What happens when the honey moon is over?) (SHARE SESSION 10334) Barton Robinson Velocity Software Barton@VelocitySoftware.com

Capacity Planning for 1000 virtual servers Objectives Capacity Overview Profiling possibilities? Show Real examples What successful installations are doing How installations save boatloads of money Capacity Planning for: Consolidation Workload growth LPAR Configurations Storage ROTs (WAS, Oracle, SAP) 2

Capacity Planning Processor Overview Processor requirements CECs IFLs LPARs LPAR Processor Overhead LPAR vcpu ratio to real IFL (1/2 % Physical overhead for each) Considerations Software paid per IFL 95% IFL utilization lowest cost One installation replaced 30 oracle servers with one IFL One installation gets hardware & system software for free 3

Capacity Planning Processor Considerations Term: Processor Overcommit Number of virtual cpus per IFL Software licensed per cpu Critical concept z/vm on z196, z9, z10 has VERY LOW MP effect Two IFLs has MORE capacity than two CECs with one IFL One IFL runs 40-50%, 2 IFLs run 50-80%, 20 IFLs run 95% 95% IFL utilization lowest cost (TCO) Two IFLs at 30% cost $100,000 more than ONE IFL at 60%. 4

Configuration Topics - Processors Processor CEC (z196, z10, z9) Configured multiple LPARs, IFLs (1-96), shared or dedicated General Purpose Processors (1-96), shared or dedicated z/vm, Linux Ifls, Shared z/vm, Linux Ifls, z/vm, Shared Linux Ifls, Shared z/os (VSE) GPs, Shared 5

Capacity Planning Storage Overview Storage requirements Target Overcommit level Storage maximums (250GB per LPAR as of z/vm 6.2) Expanded Storage (20%) Storage consideration (to keep ifls at 95% busy) How much storage is required? What storage tuning should be performed? 6

Configuration Topics Real Storage Configured multiple LPARs, Storage dedicated (256GB per LPAR) Terabytes central storage on the machine Expanded Storage for paging buffer (20% of real) Overcommit Ratio within LPAR? (1:1, 2:1, 4:1?) z/vm, Linux Ifls, Shared z/vm, Linux Ifls, Ifls, Shared z/os GPs, Shared 7

Server Consolidation Planning Replacements? 1 to 1, 2 to 1, 1 to 2? 1 to 10? Processor sizing Gigahertz is gigahertz Barton s number : 1 mip is 4-5 megahertz Z196: 5.0-5.2 Ghz Server Storage sizing Smaller is better, tuning easier, managing easier Cost of extra servers if cloned small Linux Internal overhead Linux vcpu ratio to real IFL (20:1?) 5-10% reduction going from 2 to 1 vcpus 8

Performance Management Planning Common in large successful installations: If I can t manage it, it is not going to happen Management Infrastructure in place (ZVPS Velocity Software Performance Suite) Infrastructure Requirements Performance Management Capacity Planning Requirements Analysis by server, by application, by user Operations, Alerts Chargeback, Accounting 9

Performance Management Planning Management resource consumption serious planning issue and obstacle to scalability Costs for 1,000 Servers: A 2% agent requires 20 IFLs just for management A.03% agent requires 30% of one IFL (Cost of 20 IFLs: $2M?) Ask the right questions! Data correct? Capture ratio? Cost of infrastructure? References.. 10

Performance Management Planning Report: ESALNXP LINUX HOST Process Statistics Report Monitor initialized: 21/01/11 at 07:03:00 on --------------------------------------------------------- node/ <-Process Ident-> Nice <------CPU Percents----> Name ID PPID GRP Valu Tot sys user syst usrt --------- ----- ----- ----- ---- ---- ---- ---- ---- ---- snmpd 2706 1 2705-10 0.07 0.02 0.05 0 0 snmpd 24382 1 24381-10 0.04 0.02 0.02 0 0 snmpd 2350 1 2349-10 0.04 0.02 0.02 0 0 snmpd 28384 1 28383-10 0.14 0.10 0.04 0 0 snmpd 28794 1 28793-10 0.09 0.09 0 0 0 snmpd 31552 1 31551-10 0.07 0.03 0.03 0 0 snmpd 11606 1 11605-10 0.04 0.02 0.02 0 0 snmpd 2996 1 2995-10 0.08 0.03 0.05 0 0 snmpd 31589 1 31588-10 0.05 0.03 0.02 0 0 snmpd 15356 1 15355-10 0.16 0 0.16 0 0 snmpd 15413 1 15412-10 0.10 0.08 0.02 0 0 snmpd 30795 1 30794-10 0.05 0 0.05 0 0 snmpd 1339 1 1338-10 0.05 0.04 0.02 0 0 snmpd 30724 1 30723-10 0.02 0.02 0 0 0 snmpd 28885 1 28884-10 0.06 0.02 0.04 0 0 snmpd 2726 1 2725-10 0.13 0.08 0.05 0 0 snmpd 14632 1 14631-10 0.02 0.02 0 0 0 SNMP is on every server Consumes <.1 Note, NO spawned processes 11

Agent Overhead of z10ec Report: ESALNXP LINUX HOST Process Statistics Report Monitor initialized: 04/15/11 at 10:00:00 on 2097 serial --------------------------------------------------------- node/ <-Process Ident-> Nice <------CPU Percents----> Name ID PPID GRP Valu Tot sys user syst usrt --------- ----- ----- ----- ---- ---- ---- ---- ---- ---- agent 8853 1 4390 0 2.24 0.01 0.02 1.38 0.83 agent 9878 1 4657 0 1.98 0.01 0.02 1.15 0.80 agent 6451 1 4392 0 5.68 0.03 5.59 0.03 0.02 agent 9644 1 4392 0 2.14 0.01 0.01 1.34 0.78 agent 7488 1 4379 0 1.42 0.01 0.01 0.84 0.56 agent 9634 1 4362 0 1.92 0.01 0.01 1.14 0.75 agent 5524 1 4414 0 5.22 0.04 5.14 0.03 0.02 agent 7613 1 4525 0 1.44 0.01 0.02 0.88 0.53 agent 7506 1 4388 0 1.41 0.01 0.02 0.83 0.55 agent 6673 1 3725 0 1.41 0.01 0.02 0.83 0.55 agent 6610 1 3680 0 1.44 0.01 0.02 0.89 0.52 agent 6629 1 3680 0 1.51 0.01 0.01 0.90 0.59 agent 6624 1 3677 0 1.39 0.01 0.02 0.82 0.54 snmpd 1042 1 1041-10 0.03 0.02 0.02 0 0 snmpd 977 1 976 15 0.04 0.02 0.02 0 0 Note agent uses little CPU, same as snmpd Spawned processes excessive Need full picture 12

Capacity Planning for 1000 virtual servers Company A: Consolidation project, 10,000 distributed servers 4 CECs, 120 IFLs Currently (2Q2011) 1,200 virtual servers (adding 200 per month) Currently (1Q2012) 1,800 virtual servers (adding 200 per month) Company B: Consolidation and new workload 12 CECs, 60 LPARs, 183 IFLs 800 servers Company C: Websphere 4 CECs (+2), 16(+4) LPARs, 60 IFLs 675 servers, (+75) Company M (Oracle) 1 CEC, 7 LPARS, 17 IFLs 120 (LARGE) servers 13

Installation A Server Consolidation Consolidation source servers IBM HS21 (8GB),(2x4 core, 2.5Ghz) IBM X3550 (4GB) (2x4 core, 2.5Ghz) IBM X3655 (32GB) VM (2x4 core, 2.5Ghz) Sun M4000 (64GB) (4x4core, 2.4Ghz) Sun T5140 (32GB) (2x8 core, 1.2Ghz) Many others Capacity planning process for consolidation: Inventory server counts (10,000+) Tally Gigahertz used (using native SAR) By server, by application Spec processors based on GHz used Spec storage on conservative basis 14

Installation A Highlights Processors IFLs 1 z196 (R&D) 4 z196 (was z10) 58 IFLs production Architecture Two data centers, High availability Server counts (1Q11) 1800 servers (+600) 15

Installation A LPAR Sizing Processors (1Q,2011): Z196 Lab, 18 IFLs, 2 LPARs, 4:1 Storage overcommit Z196(4) Production 2 z/vm LPARs each, Production, Staging 20-30 IFLs per CEC (Some number of GP as well) Disaster recover available by shutting staging down LPAR Sizes for Production 14-24 IFLs each (Shared) 256 GB Central each LPAR 24-72 GB Expanded (-> 128GB) 16

Installation A Initial Project Linux project started April, 2009 38 servers 3 IFLs Small traditional vm system prior, skills available Hired one more Current staff including manager: 5 1800 servers now operational (March, 2012) Workloads: Websphere Users get 50 guests at a time, 25 on each datacenter 17

Installation A Growth Growth Adding 200 servers per month for existing workload 3000 servers by 11/2012? Last year Next application: New oracle workload, replacing 400 cores (SUN) 4 TB database (12 TB / cluster) Sized at 32 IFLs (12:1) (Gigahertz sizing) 1 TB real storage This year next 5 Petabytes Project: Ground up resizing Jvms per server, heap sizes 18

Installation B z Overview Highlights of Z/VM LPARs 12 z10 / z196 (ramping up, 24 cecs currently) 183 IFLs (288 Logical processors (1.5: 1) 3800 GB Cstore, 250 GB Xstore Five data centers 800 servers (Websphere, Oracle) Many servers in 30-40GB range 200 Servers per FTE is working number Production LPARS 10-32 IFLs Each 150GB 250GB Central Storage 20-100 servers per LPAR 19

Installation B z Overview (Big CPU Picture) Report: ESALPARS Logical Partition Summary Monitor initialized: 11/06/10 at 16:07:10 on 2097 serial 374E: 11/0 ------------------------------------------------------------------- <--Complex--> <-------Logical Partition---> <-Assi Proce Phys Dispatch Virt <%Assigned> <---LP Type Time CPUs Slice Name Nbr CPUs Total Ovhd Weight -------- ---- -------- -------- --- ---- ----- ---- ------ ----- 16:09:00 37 Dynamic Totals: 0 50 3146 25.0 3000 L43 19 6 574.6 0.6 148 IFL <-- 95% C41 10 1 100.0 0.0 Ded ICF C42 11 1 96.1 0.1 850 ICF C43 14 1 99.7 0.0 Ded ICF C44 15 1 0.8 0.1 150 ICF P41 1 7 422.1 3.2 717 CP P44 9 2 43.4 0.2 70 CP T41 4 5 197.5 0.5 193 CP T44 7 2 9.8 0 20 CP L41 17 22 1557 19.6 777 IFL - 71% L42 18 2 44.7 0.8 75 IFL Totals by Processor type: <---------CPU-------> <-Shared Processor busy> Type Count Ded shared total assigned Ovhd Mgmt ---- ----- --- ------ ----- -------- ---- ---- CP 6 0 6 584.7 573.3 3.6 7.8 IFL 27 0 27 2220 2176.3 21.0 22.9-80% of IFLs ICF 3 0 3 297.8 296.5 0.1 1.1 ZIIP 1 0 1 99.9 99.5 0.3 0.1 20

Installation B z Overview 4000 3500 3000 2500 2000 1500 l73 l71 1000 500 0 1 5 9 131721252933374145495357616569737781858993 CEC 01 for one day, 38 IFLs Storage overcommit: none Processor overcommit: 5:1 21

Installation B z Overview 3500 3000 2500 2000 1500 1000 l12 l11 l13 500 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 CEC 13 for one day, 38 IFLs 30 IFLs consumed is 80% busy Storage overcommit: none Processor overcommit: 5:1 22

Installation B z Overview 7000 6000 5000 4000 3000 2000 l12 l11 l13 l73 l71 1000 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 Both CECs for one day, 76 IFLs Room for growth or consolidation Balancing workload across CECs? 23

Installation C z Overview Highlights (POC 2005ish) 4 z196 (+1), 2 production, inhouse DR 60 IFLs 16 LPARS (+4 in 6 months) Two data centers, High availability 675 servers (Websphere) Serious chargeback requirements Production LPARS 4 production LPARs, 400GB / 90 GB ExStore Overcommitt: 560gb / 490gb = 1.15 TEST/Dev LPARS 24

Installation C Overcommit IFLs: 55 (-5) (Went from z10 to z196) 675 servers (Websphere) 12 servers per IFL (was 10) 1030 Virtual CPU (25:1) Storage 970 (+100) GB Central 184 GB Expanded Virtual storage: 1600GB (+300) Overcommit (overall): 1.3 to 1 25

Installation M Z Overview 3 Year project to date (2011) POC summer 2008 Two VM/Linux Systems programmers Processors: 1 z10 EC, 17 IFLs 7 lpars, 17 virtual cpus each 560GB Real storage / 92 GB Expanded DR site available Storage FCP 30TB, systems on ECKD 26

Installation M Z Overview Linux Servers 120 servers (Big, ORACLE) 7 servers per IFL 395 vcpus (23:1 overcommit) 4gb-40gb (1 / 2 size from original SUN servers) 974 GB Server storage (1.5 : 1 overall overcommit) 8GB per server??? 27

Installation M LPAR Overview Zones separated by LPAR Development Validation (Quality Assurance) Production (gets the resources when needed) Workload zones (3 tier, by LPAR) Presentation Data (Oracle) Application (WAS) All heavy hitting (data, application) moved/moving to z 28

Installation M Z Production LPAR Overview LPAR A Development oracle, 110gb Central / 22gb Expanded, 30 servers, 100 vcpus 30 page packs 3390-6 LPAR 1 Application WAS, 180gb Central / 40gb Expanded 20 servers, 80 vcpus 60 page packs 3390-9, LPAR 4 Data Oracle 130gb Central / 24gb Expanded 29

Installation M LPAR Sizing 2000 1800 1600 1400 1200 1000 800 600 400 200 0 1 3 5 7 9 1113151719212325272931333537394143454749515355575961636567697173757779818385878991939597 17 IFLs, 7 lpars, 17 vcpus each, 7:1 overcommit Overhead significant from real processor overcommit 30

Installation M Z Growth Processors: Over 3 years Z9, 11 IFLs moved to z10 17 IFLs Moving to Z196, 25 IFLs (doubling capacity) Developers see pretty good performance Can we move too? Always issues on other side Workload Growth Adding 110 Oracle databases Replacing 32 Solaris Servers (120 cores) Server from Hell had 30 databases on it 31

Installation M Z Growth 2011 status We have added a total of 154 z/linux guests. We have turned a lot of these into Enterprise guests meaning in some cases we have multiple JVMs on a guest as well as multiple Oracle Data bases on a single guest. The majority of the guests are Oracle Data base guests ranging from 500MB to 15TB in size for a single Data base. We have also brought over multiple WAS servers. Other than using a lot of Memory and DASD storage things seem to be running well. 32

Velocity Software Performance Management Instrumentation Requirements Performance Analysis Operational Alerts Capacity Planning Accounting/Charge back Correct data Capture ratios Instrumentation can NOT be the performance problem 33

A scalable z/vm Performance Monitor Traditional model (1989) ZMON: Real time analysis Uses Standard CP Monitor Real Time Analysis VM CP Monitor ZMAP: Performance Reporting Post Processing Creates Long Term PDB PDB or monwrite data input ZMON Real-Time Displays PDB (Performance DataBase) Complete data By Minute, hour, day Monthly/Yearly Archive PDB PDB ZMAP Reports 34

Linux and Network Data Acquisition zbx WinNT SUN/HP/AIX Blade/Linux TCPIP SNMP/MIB II snmp ZTCP VM CP Monitor (VSE!!!) (VMWare) LINUX SNMP/Host Mibs ESATCP: Network Monitor SNMP Data collection Data added to PDB Availability Checking PDB ZMON Real-Time Displays Collects data from: LINUX (netsnmp) NT/SUN/HP (native snmp) Printers/Routers. PDB ZMAP Reports 35

Add Enterprise Support for capacity planning tools TCPIP SNMP/MIB II ZTCP VM CP Monitor LINUX SNMP/Host Mibs ZMON CA / MICS PDB Openview, Wiley (SNMP Alerts) ZMAP MXG BMC Mainview TDS,TUAM 36

Copyright 2010 Velocity Software, Inc. All Rights Reserved. What we re doing for Capacity Planning CPU by lpar by Processor type CPU BY userclass 37

Copyright 2010 Velocity Software, Inc. All Rights Reserved. See what we re doing for Capacity Planning VelocitySoftware.com See the demo 38

Copyright 2010 Velocity Software, Inc. All Rights Reserved. Processor Ratios: Capacity Planning Metrics LPAR logical processors per real processor (LPAR Overhead) Linux virtual processors per real (Linux overhead) Storage ratios Storage per processor Expanded storage per Real storage Overcommit ratios Servers per processor How many distributed servers replaced per IFL? 39

Copyright 2010 Velocity Software, Inc. All Rights Reserved. 1000 servers has been done Management required. Issues are driving too fast to stop for gas Capacity Planning Summary Saving too much to figure out where we re at Do a capacity plan, but don t have time to review accuracy (2 years later) Processors: Gigahertz are gigahertz Processors highly utilized and shared saves money Storage: No good guidelines Oracle and SAP are usually larger than WAS Expanded storage should follow the Velocity best practices 40