Linux on z/vm Memory Management



Similar documents
Large Linux Guests. Linux on z/vm Performance. Session 12390

Why Relative Share Does Not Work

Understanding Linux on z/vm Steal Time

Linux on z/vm Configuration Guidelines

GSE z/os Systems Working Group s390-tools

KVM & Memory Management Updates

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

Determining the Correct Usage of Swap in Linux * 2.6 Kernels

Capacity Planning for 1000 virtual servers (What happens when the honey moon is over?) (SHARE SESSION 10334)

Understanding Memory Resource Management in VMware vsphere 5.0

Rackspace Cloud Databases and Container-based Virtualization

Practical Performance Understanding the Performance of Your Application

Monitoring Databases on VMware

What Is Specific in Load Testing?

Memory Management under Linux: Issues in Linux VM development

BridgeWays Management Pack for VMware ESX

Performance and scalability of a large OLTP workload

Avoid Paying The Virtualization Tax: Deploying Virtualized BI 4.0 The Right Way. Ashish C. Morzaria, SAP

Analyzing IBM i Performance Metrics

Delivering Quality in Software Performance and Scalability Testing

PEPPERDATA IN MULTI-TENANT ENVIRONMENTS

Garbage Collection in the Java HotSpot Virtual Machine

Rational Application Developer Performance Tips Introduction

Benchmarking Hadoop & HBase on Violin

Tech Tip: Understanding Server Memory Counters

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

VirtualCenter Database Performance for Microsoft SQL Server 2005 VirtualCenter 2.5

Cognos8 Deployment Best Practices for Performance/Scalability. Barnaby Cole Practice Lead, Technical Services

Avoiding Performance Bottlenecks in Hyper-V

Memory Resource Management in VMware ESX Server

Contents Introduction... 5 Deployment Considerations... 9 Deployment Architectures... 11

Product Review: James F. Koopmann Pine Horse, Inc. Quest Software s Foglight Performance Analysis for Oracle

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.


Virtualization. Dr. Yingwu Zhu

Response Time Analysis

Where is the memory going? Memory usage in the 2.6 kernel

Google File System. Web and scalability

APPLICATION OF SERVER VIRTUALIZATION IN PLATFORM TESTING

Pushing the Limits of Windows: Physical Memory Mark Russinovich (From Mark Russinovich Blog)

Windows Server Performance Monitoring

Running Linux on System z as a z/vm Guest: Useful Things to Know

Audit & Tune Deliverables

Enterprise Applications in the Cloud: Virtualized Deployment

Virtuoso and Database Scalability

HCIbench: Virtual SAN Automated Performance Testing Tool User Guide

Enterprise Manager Performance Tips

Enhancing SQL Server Performance

Practical Online Filesystem Checking and Repair

An objective comparison test of workload management systems

z/vm and Linux on zseries Performance Monitoring An Update on How and With What Products

Best Practices for Optimizing SQL Server Database Performance with the LSI WarpDrive Acceleration Card

vrealize Operations Manager User Guide

Cloud Server. Parallels. An Introduction to Operating System Virtualization and Parallels Cloud Server. White Paper.

Secondary Storage. Any modern computer system will incorporate (at least) two levels of storage: magnetic disk/optical devices/tape systems

Mission-Critical Java. An Oracle White Paper Updated October 2008

Tableau Server Scalability Explained

Microsoft Office SharePoint Server 2007 Performance on VMware vsphere 4.1

SAS Grid Manager Testing and Benchmarking Best Practices for SAS Intelligence Platform

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

Performance Management in a Virtual Environment. Eric Siebert Author and vexpert. whitepaper

Response Time Analysis

Server and Storage Sizing Guide for Windows 7 TECHNICAL NOTES

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

MAGENTO HOSTING Progressive Server Performance Improvements

- An Essential Building Block for Stable and Reliable Compute Clusters

An Oracle White Paper September Advanced Java Diagnostics and Monitoring Without Performance Overhead

Whitepaper: performance of SqlBulkCopy

Using VMware VMotion with Oracle Database and EMC CLARiiON Storage Systems

The Classical Architecture. Storage 1 / 36

Web Server (Step 1) Processes request and sends query to SQL server via ADO/OLEDB. Web Server (Step 2) Creates HTML page dynamically from record set

How To Write To A Linux Memory Map On A Microsoft Zseries (Amd64) On A Linux (Amd32) (

Systemverwaltung 2009 AIX / LPAR

Managing Capacity Using VMware vcenter CapacityIQ TECHNICAL WHITE PAPER

A Survey of Shared File Systems

Active Continuous Optimization (ACO) for Server Infrastructure Performance Improvement

HP Service Manager Shared Memory Guide

Tunable Base Page Size

The Revival of Direct Attached Storage for Oracle Databases

Users are Complaining that the System is Slow What Should I Do Now? Part 1

PARALLELS CLOUD SERVER

Database Virtualization

sql server best practice

VI Performance Monitoring

COS 318: Operating Systems

WebSphere Performance Monitoring & Tuning For Webtop Version 5.3 on WebSphere 5.1.x

WebSphere Architect (Performance and Monitoring) 2011 IBM Corporation

Virtual Networking with z/vm Guest LAN and Virtual Switch

SSD Performance Tips: Avoid The Write Cliff

Tuning WebSphere Application Server ND 7.0. Royal Cyber Inc.

IOS110. Virtualization 5/27/2014 1

Optimizing the Performance of Your Longview Application

System and Storage Virtualization For ios (AS/400) Environment

Linux VM Infrastructure for memory power management

co Characterizing and Tracing Packet Floods Using Cisco R

Running SAP Solutions in the Cloud How to Handle Sizing and Performance Challenges. William Adams SAP AG

Parallels Virtuozzo Containers

Response Time Analysis

Transcription:

Linux on z/vm Memory Management Rob van der Heij rvdheij @ velocitysoftware.com IBM System z Technical Conference Brussels, 2009 Session LX45 Velocity Software, Inc http://www.velocitysoftware.com/ Copyright 2009 Velocity Software, Inc. All Rights Reserved. Other products and company names mentioned herein may be trademarks of their respective owners.

Introduction Virtualization is said to be easy and automatic Our Linux application people got out-of-memory errors and claim they have not enough memory. But z/vm was not even paging a little bit. Can you explain that again? 2

Linux on z/vm experts know there are challenges It is not always easy or automatic Not intuitive - Some knobs seem to go reverse Expert guidance must be understood within context This is confusing for new installations Need a consistent overview of where we are Context of this presentation Building on real customer data and experience Running Linux on z/vm for almost 10 years Applies to Linux on z/vm less interested in LPAR Little interest in benchmarks or artificial lab workloads Introduction 3

Agenda CMS versus Linux z/vm Paging versus Linux Swap Two Layers of Memory Management Linux Page Cache Virtual Server Sizing Memory Tuning Options Enterprise Applications Performance data shown in the presentation was collected and processed with ESALPS. 4

Why can t I use my Linux Tools? Linux data is incomplete and incorrect Virtualization changes the rules of the game CPU Usage perceived by Linux can be very wrong Assumptions about used and available do not hold anymore z/vm performance impacts Linux behavior Need to combine Linux and z/vm performance data z/vm does not clone system administrators You may not have time to look when it happens Complex interactions make it hard to reproduce Multi-tier application involves multiple virtual servers Centralized data collection is easier to manage 5

CMS versus Linux Traditional z/vm installations are surprised by Linux CMS Applications take the resources they need CMS was designed to run in shared resource environment No extra resource usage just because it is there Virtual machine size does not increase memory requirement Linux on z/vm is different Designed to run in a dedicated resource environment Use all resources you can get would be wasted otherwise Same workload in a larger virtual machine uses more memory More expensive to run - not always better or faster In many cases performance will be worse not intuitive 6

z/vm Paging Technique to implement virtual memory Memory over commit Limited amount of real memory to hold working set Sum of virtual machine memory is more than real memory Disk space to make up for the difference Not everyone will want all their resources at the same time Over commit is good: only way to enforce sharing Hardware support makes paging transparent Virtual machine does not notice functional difference Paging does cause latency and slows things down z/vm Paging Challenges Which pages can be paged out to create available memory Which additional pages should also be paged in 7

General z/vm Paging Strategy Take memory resources away from idle users User resources reviewed during demand scans New transaction may have different requirements Users keep resources until transaction completes No demand for CPU resources for some time Virtual machine drops from queue z/vm Paging Less effective with Linux virtual machines Many Linux servers are never idle remain in queue Easy demand scans don t produce enough pages Most free pages produced in emergency scan Resources also taken from active virtual machines Expanded storage helps to reduce the damage LX44 Wed 15:15 8

z/vm Paging and Linux Swap Linux terminology swap is misleading No modern Operating System uses swapping anymore Both Linux and z/vm using demand paging techniques z/vm design is to accommodate some paging Many systems run fine under moderate paging load Some systems handle fairly high paging rates Expanded storage introduces paging hierarchy Virtual machine sizing prevents excessive usage Linux on dedicated hardware is not designed to swap Disk I/O for swap in Linux impacts performance Linux systems are typically not sized to swap to disk Swap serves as a rescue for process out-of-control 9

Two Layers of Memory Management Linux on z/vm: two layers of memory management Both need to make same kind of decisions Which pages to retain and which to select for page out Which additional pages to page in They do so mostly independent of each other Two managers is not better than one manager Linux real memory is in fact virtual memory on z/vm Both use Least Recently Used (LRU) strategy Local optimizations are not productive on global level Two layers of LRU interfere worst case scenario 10

Two Layers of Memory Management Two layers of management cause inefficiency z/vm retains data that Linux does not need now Data from processes that are idle Memory that is unused after program terminates Collaborative Memory Management CMM1 Linux uses memory unaware of real cost Use of excess memory to cache data and avoid I/O Competes with busy servers that need the memory Size the virtual machine to avoid excessive cache 11

Two Layers of Memory Management z/vm View of Virtual Machine Resources Grows from 133 MB to 378 MB Some virtual machine memory was paged out Screen: ESAUSPG Velocity Software-Test VSIVM4 ESAMON 3.770 04/16 1 of 2 User Storage Analysis CLASS * USER ROBLX1 <---Storage occupancy in pages---> Pages <Address UserID <---Main Storage---> <--Paging---> Moved <Pages R Time /Class Total >2GB <2GB Xstor DASD <2GB VirtDisk -------- -------- ------ ------ ------ ------ ------ ------ -------- 03:17:00 ROBLX1 96891 34766 62125 217 0 0 0 03:16:00 ROBLX1 96891 34766 62125 217 0 0 0 03:15:00 ROBLX1 35204 11474 23730 0 1 0 32 03:14:00 ROBLX1 35204 11474 23730 0 1 0 32 03:13:00 ROBLX1 35204 11474 23730 0 1 0 32 03:12:00 ROBLX1 34625 11474 23151 0 1 0 32 03:11:00 ROBLX1 34181 11474 22707 0 1 0 32 03:10:00 ROBLX1 34181 11474 22707 0 1 0 32 12

Two Layers of Memory Management Linux View of Memory Resources Grows from 75 MB to 80 MB not like VM metrics Overall figure is factor 2 off Nothing reveals the increase with 250 MB Screen: ESAUCD2 Velocity Software-Test VSIVM4 ESAMON 3.770 04/16 1 of 2 LINUX UCD Memory Analysis Report NODE BROBLX1 <------------------Storage sizes in MegaBytes -- <--Real Storage--> Over <-----SWAP Storage----> Time Node Total Avail Used head Total Avail Used MIN -------- -------- ------ ----- ----- ----- ----- ----- ----- ----- 03:17:00 broblx1 994.8 915.3 79.5 36.7 125.0 125.0 0 15.6 03:16:00 broblx1 994.8 915.3 79.5 36.7 125.0 125.0 0 15.6 03:15:00 broblx1 994.8 915.5 79.3 36.4 125.0 125.0 0 15.6 03:14:00 broblx1 994.8 915.5 79.3 36.4 125.0 125.0 0 15.6 03:13:00 broblx1 994.8 915.5 79.3 37.0 125.0 125.0 0 15.6 03:12:00 broblx1 994.8 917.7 77.1 34.7 125.0 125.0 0 15.6 03:11:00 broblx1 994.8 919.9 74.8 33.0 125.0 125.0 0 15.6 03:10:00 broblx1 994.8 919.9 74.8 33.0 125.0 125.0 0 15.6 13

Two Layers of Memory Management Two memory managers do not exchange information Results in less than optimal resource usage 400 350 300 Used (MB) Res (MB) Memory Usage Linux Screen: ESALNXP Velocity Software-Test VSIVM4 ESAMON 3.770 04/16 03:10 1 of 4 LINUX VSI Process Statistics Report NODE BROBLX1 LIMIT 5 209 <-Process Ident-> <------CPU Percents-----> Time Node Name ID PPID GRP Total sys user syst usrt -------- -------- --------- ----- ----- ----- ----- ---- ---- ---- ---- 03:19:00 broblx1 vsi-agen 1147 1 1099 0.2 0.2 0.0 0 0 *Totals* 0 0 0 0.3 0.2 0.1 0 0 03:18:00 broblx1 vsi-agen 1147 1 1099 0.2 0.2 0.0 0 0 *Totals* 0 0 0 0.3 0.2 0.1 0 0 03:17:00 broblx1 vsi-agen 1147 1 1099 0.2 0.2 0.0 0 0 *Totals* 0 0 0 0.3 0.2 0.1 0 0 03:16:00 broblx1 bash 1430 1429 1430 3.6 0 0 0.4 3.2 vsi-agen 1147 1 1099 0.2 0.2 0.0 0 0 *Totals* 0 0 0 4.0 0.2 0.1 0.4 3.3 03:15:00 broblx1 vsi-agen 1147 1 1099 0.2 0.2 0.0 0 0 *Totals* 0 0 0 0.3 0.2 0.1 0 0 03:14:00 broblx1 vsi-agen 1147 1 1099 0.2 0.2 0.0 0 0 *Totals* 0 0 0 0.3 0.2 0.1 0 0 03:13:00 broblx1 bash 1430 1429 1430 0.3 0 0.2 0.0 0.1 sshd 1427 1184 1427 0.1 0 0.0 0 0.1 vsi-agen 1147 1 1099 0.2 0.2 0.0 0 0 *Totals* 0 0 0 0.7 0.2 0.3 0.0 0.1 Usage (MB) 250 200 150 100 50 0 03:11 03:13 03:15 03:17 The bash shell ran a program Already ended in next sample Linux freed memory z/vm retains the memory 14

Linux Page Cache Basic concept of Linux memory management Not some allocated area you could remove or resize Memory pages with corresponding disk location Program code loaded from disk (demand paging) Shared libraries loaded from disk (demand paging) Pages swapped out but not yet re-used (swap cache) Excessive memory will be used as page cache But not all in page cache is wasted resources Some amount of page cache must be kept available 15

Linux Page Cache Linux-2.4 approach was very simplistic Swap-out would require an I/O operation Keeping data in cache might save an I/O operation Strategy: drop data from cache to avoid swap-out Effect: unreferenced old data occupied memory Linux-2.6 introduced swappiness parameter Provide a way to swap out unreferenced old data Keep some page cache even if that causes swap-out Specified as percentage of total Linux memory Must be adjusted when virtual machine size is changed 16

Virtual Server Sizing Dedicated server footprint Large enough to handle peak requirements Standard server size Swap defined for Linux over commit No swapping Excess memory used as page cache Virtual server footprint* Sized just large enough for peak Sizing based on application requirements Swap defined for Linux over commit No need for swapping Little penalty for less page cache Easy win - may be good enough Goal is to avoid z/vm paging Excess memory dedicated server Linux memory requirements virtual server Excess memory Linux memory requirements footprint* is total requirement, not just virtual machine size 17

Virtual Server Sizing Squeeze the virtual machine Smaller than peak requirement Will be swapping during peaks Acceptable when using VDISK Add as much VDISK as you reduced the virtual machine size Replacement rather than extra Expect both to remain resident Swapping reduces requirements Linux will cache less to avoid swap Be aware of swappiness Total peak requirement goes down Not easy to predict must measure Excess VDISK can be taken out Need to do so if you want to reduce the footprint of the server Excess swap disk behaves LRU VDISK swap VDISK swap footprint virtual machine Excess memory Linux memory requirements footprint virtual machine Excess memory Linux memory requirements 18

Virtual Server Sizing Removing excessive VDISK Reduces virtual server footprint Allows for more servers without z/vm paging Still need unused swap space for Linux memory over commit When monitoring and alerting: use VDISK since it is almost for free Otherwise use real disk (slows down the server so users will call you) ESALPS: Review Swap Used over time in ESAUCD2 report footprint virtual machine VDISK swap Excess memory Linux memory requirements footprint savings virtual machine VDISK swap Excess memory Linux memory requirements 19

Virtual Server Sizing Important to remove excessive VDISK Linux prefers to use fresh blocks causes fragmentation Over time a large portion of VDISK will be previously used Linux does not care about the contents, no further references z/vm still preserves the data and pages it out to disk Does not slow you down yet, but uses paging capacity Eventually Linux will re-use the old VDISK blocks Forces z/vm to page in VDISK slows down Linux VDISK swapping Reduces available z/vm paging capacity Experiments suggest latest kernel may not need this anymore Watch this space ESALPS: Compare Swap Used in ESAUCD2 with resident and paged in ESAVDSK 20

Virtual Server Sizing Measuring VDISK fragmentation Linux uses 11.3 MB (2900 pages) z/vm retains 125 MB (32,000 pages) ESAUCD2 <------------------Storage sizes in MegaBytes -- <--Real Storage--> Over <-----SWAP Storage----> Time Node Total Avail Used head Total Avail Used MIN -------- -------- ------ ----- ----- ----- ----- ----- ----- ----- 08:16:00 broblx1 1498.8 1010 489.0 71.6 125.0 113.6 11.3 15.6 Ready; T=0.01/0.01 08:16:19 ESAVDSK <--Size---> <--pages--> DASD X- AddSpc VDSK Resi- Lock- Page Store Time Owner Space Name Pages Blks dent ed Slots Blks -------- -------- ------------------------ ----- ----- ----- ----- ----- ----- 08:16:00 ROBLX1 VDISK$ROBLX1$$$0203$0009 32000 256K 29585 0 2 2415 21

Virtual Server Sizing Adding more servers eventually requires z/vm paging Good thing it saves memory resources Footprint varies over time Virtual machines take turns using real memory - sharing z/vm page-out when server goes idle CP must be able to see which server is idle z/vm page-in at transaction start Causes some latency for transactions Page-in Page-out VDISK swap footprint Linux memory requirements Excess memory virtual machine 22

Virtual Server Sizing Active server requires peak footprint to be resident Amount of memory needed to run workload without paging Virtual machine as well as swap VDISK Only visible when z/vm has some memory pressure Idle server requires idle footprint to be resident Memory required to run background tasks Virtual machine memory still referenced by Linux No swapping, VDISK can be paged out Multiple VDISKs to introduce hierarchy and locality of reference Not every workload shows usage patterns that require this Recent kernels use different strategy to allocate blocks Transition between peak and idle involves z/vm paging Balanced system Peak footprint requires taking resources away from idle servers Total of current peak and idle footprints must fit at all times 23

Virtual Server Sizing Transaction latency z/vm paging capacity Number of paging devices Efficiency of block paging Minus capacity used for other paging # of pages t slow # of pages t fast Virtual machine sizing Amount of pages to be paged in Difference between idle and peak size Large amounts may hurt even more Replenish the available list page-out needed Increased demand scans # of pages t # of pages t Total of active virtual machines must fit slow fast 24

Virtual Server Sizing Conclusion Total of peak and idle footprints must fit real memory Will change with new workload - Measure and adjust Server peak requirement determined by application Virtual machine memory Active swap in VDISK Server VDISK ratio is main control Active vdisk divided by total virtual machine t t VDISK ratio Idle footprint Latency Peak overhead Production Low Large Small Small Development High Small Large Large prod test 25

Virtual Server Sizing Suggested approach only works with VDISK swap Makes Linux swap during peak utilization Real I/O to disk would be too slow to do this Performance penalty of VDISK swap is minimal Causes some extra CPU usage during peak utilization Should not be a problem for low-utilized servers Confuses people with discrete server background It s not really swapping, it s just memory This is a Linux on z/vm thing only Do not confuse it with the Classical Linux Swap 26

Virtual Server Sizing Classical Linux Swap Disk Normally not used not even during peak utilization Primarily to encourage Linux to over commit memory May be used during extreme workload or things out of control Use VDISK for this too when you have alerts set up When not in use it is almost free When it does get used, review your workload sizing Set up an alert to detect when this gets used Refurbish the VDISK when problem has been fixed Using real disk will slow down Linux seriously when being used Users and managers will alert you when performance is bad 27

Cooperative Memory Management First attempt to couple memory managers CMM1 Builds on technique used by CMS Diagnose Instruction to release page Mapping to real page frame is removed page is not there Backing page slot in Expanded Storage or on DASD dropped Support as ballooning technique in Linux Balloon acquires real pages from Linux to give back to z/vm Effective as temporary reduction of Linux memory size Blocked for further use in Linux until balloon is deflated again Requires proper steering to set size of balloon 28

Cooperative Memory Management modprobe cmm /proc/sys/vm/cmm_pages size of the balloon lxrob1:/proc/sys/vm # free -m total used free shared buffers cached Mem: 1498 1295 202 0 43 1125 -/+ buffers/cache: 126 1372 Swap: 124 10 114 100 MB from free lxrob1:/proc/sys/vm # echo 25000 > cmm_pages lxrob1:/proc/sys/vm # free -m total used free shared buffers cached Mem: 1498 1393 105 0 43 1125 -/+ buffers/cache: 223 1274 Swap: 124 10 114 lxrob1:/proc/sys/vm # echo 100000 > cmm_pages lxrob1:/proc/sys/vm # free -m 200 MB from cached 200 MB from free total used free shared buffers cached Mem: 1498 1492 6 0 41 941 -/+ buffers/cache: 508 989 Swap: 124 10 114 lxrob1:/proc/sys/vm # cat cmm_pages 100000 29

Cooperative Memory Management Benefits are as good as quality of steering IBM implemented controls in VMRM Published results do not show much benefit Flexible sizing of the balloon is realistic option today Preparation of virtual machine resizing (be reasonable) Scheduled resource reduction during day shift Many factors affect dynamic tuning of CMM1 Several of those involve Crystal Ball Technology Cost for inflating the balloon Do not ask Linux to do the impossible Do not make Linux give up cache without a reason Both Linux and z/vm performance data required 30

Setting of swappiness Linux memory management parameter /proc/sys/vm/swappiness Value ranges from 0 to 100 default at 60 Allow for swap-out to retain some page cache Popular in Linux on desktop usage Strong opinions about both extremes Low swappiness Don t swap out my application just to cache data High swappiness Get rid of rarely referenced anonymous data Tuning of the parameter is not obvious Objective is to retain some amount of page cache The swappiness control is rather vague Default of 60 is probably high for large servers 31

Setting of swappiness Kernel Compile Cache friendly scenario Low swappiness: Double amount of swapping Increased non-swap disk I/O Lazy write of temporary files Swap rate (pg/s) 500 400 300 200 100 Kernel build in 100M swappiness 60 so si Sizing virtual machine is the best way to save resources 0 01:17 01:21 01:25 01:29 01:33 Kernel build in 100M swappiness 10 Real Memory 100MB 100MB 1GB 500 400 so si swappiness Average Resident Runtime 60 100M 14 10 100MB 13 0 360MB 12 Swap rate (pg/s) 300 200 100 0 01:47 01:51 01:55 01:59 32

Use of drop_caches Kernel command introduced with Linux-2.6 Kernel action rather than a tuning parameter Instructs the kernel to drop all cached data and inodes Some suggest to use sync to clean dirty pages Kernel will load back all popular data after this Works only for Linux memory management z/vm is not aware that pages are now unused Likely paged-out since Linux does not reference them further CMM-1 could inform z/vm memory management But why not use just CMM-1 to free pages and return them Useful to establish some base level requirement Supposed to get you out of swap thrashing situation Not performance tuning but just a debugging aid 33

Using cpuplugd for memory cpuplugd is part of s390-tools package plug: is Linux speak for enabling something ( hot plug ) cpuplugd: background process for enabling virtual CPUs Can be used to control CMM-1 as well memunplug: to unplug memory, inflate the balloon memplug: to plug in some memory, deflate balloon Memory control disarmed in sample configuration # Per default this function is disabled, because this rule has to be # adjusted by the admin for each production system, depending on the # environment Objectives and target values are static settings UPDATE="60" CMM_MIN="0" CMM_MAX="8192" CMM_INC="256 34

Using cpuplugd for memory Steering is done with only Linux metrics # Memplug and memunplug can contain the following keywords: # - apcr: the amount of page cache reads # - freemem: the amount of free memory (in megabyte) # - swaprate: the number of swapin and swapout operations Steering specified as rules MEMPLUG = "swaprate > freemem+10 & freemem+10 < apcr" MEMUNPLUG = "swaprate > freemem + 10000 Since apcr includes swap I/O, first rule can be simplified to MEMPLUG = swaprate > freemem+10 Balanced system has 2-10 MB free -> add memory when swapping 5 pg/s or more Suggested rule for unplug is incorrect Removing memory at swap rate of more than 10 MB/s is bad idea! More logical approach to remove memory when not swapping at all MEMUNPLUG = "swaprate = 0 35

Using cpuplugd for memory Linux Memory Tuning with cpuplugd Controls are very limited and simplistic Either too aggressive or too soft, high latency This has never been tried for real Static target values to represent z/vm resources Lack of feedback causes waste of resources When developer can t get sample right, should I try? Real Memory swappiness Average Resident Runtime 100MB 60 100M 14 100MB 10 100MB 13 1GB 0 360MB 12 25MB- 60 160MB 15 After making the rules work Pivot around swapping 10 KB/s CMM_MAX to just avoid killing it Increment of 1 MB Update every 10 seconds Results depend a lot on the workload Probably not the best possible setting 36

Collaborative Memory Management Improvement of ballooning technique in CMM-1 New ESSA instruction in z9 and later hardware Collaborative Memory Management Assist (CMMA) SLES10 Support and fixes in SP2 Kernel option cmma=on (your defaults may vary) Support introduced in z/vm 5.3 MEMASSIST The Good News Addresses all concerns with CMM-1 Very little overhead because of hardware support Fully automatic tuning and self optimizing No external controls needed (no VMRM interaction) 37

Collaborative Memory Management Hardware support for virtual memory page state Linux state (simplified) Unused Linux does not care for contents anymore Volatile Contents is also on disk, may be taken away Stable Page is in use and must be retained by z/vm ESSA instruction to communicate state changes z/vm state Resident Contents is present in main memory Preserved Contents is paged out Zero Page full of binary zero Special program check when page is gone 38

Collaborative Memory Management No Blinkenlights implemented You can t tell whether it is working at all Response of QUERY MEMASSIST is not accurate Last minute option in Linux to disable CMMA CMMA is active only when both Linux and z/vm enable it Linux fix should go into the service stream eventually lxrob1:~ # modprobe vmcp lxrob1:~ # cat /proc/kallsyms grep cmma_flag 00000000006dc000 B cmma_flag lxrob1:~ # vmcp d r6dc000 R006DC000 00000001 1 means CMMA active Default setting is probably surprising on older hardware 2. At z/vm IPL, the initial setting for MEMASSIST FOR ALL is ON, even if the assist is not installed on the machine. This allows a virtual machine to SET MEMASSIST ON, causing the assist to be simulated by z/vm for guest testing purposes. 39

Collaborative Memory Management No instrumentation available Linux does not keep statistics about state changes z/vm monitor does not reveal any counters about it HPMA might make existing metrics incorrect Documentation is limited and fragmented ESSA instruction not in the Principles of Operation z/vm implementation is Object Code Only Linux source code is just part of the story Future of the CMMA code in Linux is uncertain Affects algorithms in the common kernel source Probably too risky for Open Source community SLES11 ships only part of CMMA anymore Published performance benefits are not convincing Lab workload can be constructed to demonstrate function Without instrumentation you can t tell what it does for you 40

Collaborative Memory Management z/vm 5.3 does exploit Linux unused pages Without instrumentation - like searching for a black hole Large program in Linux terminated lots of pages available Workload in another server required more resident pages Screen: ESAUSPG Velocity Software-Test VSIVM4 1 of 2 User Storage Analysis <---Storage occupancy in pages---> UserID <---Main Storage---> <--Paging---> Time /Class Total >2GB <2GB Xstor DASD -------- -------- ------ ------ ------ ------ ------ 04:07:00 ROBLX1 4883 3067 1816 3930 16607 04:06:00 ROBLX1 4619 2958 1661 4059 16607 04:05:00 ROBLX1 4409 2896 1513 4121 16607 04:04:00 ROBLX1 4029 2821 1208 4196 16607 04:03:00 ROBLX1 1985 1145 840 5873 16607 04:02:00 ROBLX1 343282 94830 248452 19677 2 04:01:00 ROBLX1 345280 95991 249289 17775 2 04:00:00 ROBLX1 343071 94726 248345 19893 2 342K pgs removed almost nothing paged Works! 41

Collaborative Memory Management z/vm 5.3 does not exploit Linux volatile memory Linux server with 1 GB of non-dirty page cache Memory pressure in z/vm causes page-out Screen: ESAUSPG Velocity Software-Test VSIVM4 1 of 2 User Storage Analysis <---Storage occupancy in pages---> UserID <---Main Storage---> <--Paging---> Time /Class Total >2GB <2GB Xstor DASD -------- -------- ------ ------ ------ ------ ------ 08:01:00 ROBLX1 240030 44309 195721 44484 112282 08:00:00 ROBLX1 246269 49249 197020 62671 86328 07:59:00 ROBLX1 250585 52880 197705 60846 83055 07:58:00 ROBLX1 249993 52780 197213 61501 82433 07:57:00 ROBLX1 251415 53409 198006 61039 81113 07:56:00 ROBLX1 275905 66645 209260 77987 39328 07:55:00 ROBLX1 360209 108446 251763 27900 5080 07:54:00 ROBLX1 382596 121782 260814 7833 2760 07:53:00 ROBLX1 386547 123077 263470 6642 0 400 300 200 100 0 z/vm Paging Linux Volatile Memory 07:51 07:55 07:59 08:03 120K pgs removed All paged out 42

Collaborative Memory Management No full exploitation by z/vm memory management Publications phrase it as z/vm could use Existing Demand Scan strategy has not changed Pages are selected for removal just like before Virtual machine queue drop remains very important CP does skip the actual page-out when not necessary May result in reduced paging requirements Could avoid poor response due to paging very large virtual servers Probably a good thing for transition No fair treatment when only some servers enable CMMA Remaining CMMA-lite function could provide value Hard to tell without instrumentation 43

Enterprise Applications You think two memory managers is bad? Enterprise Applications add third layer of management Allocate a large (user configured) chunk of virtual memory Implement some kind of management strategy DB2 UDB uses buffers to cache data and index Oracle allocates SGA and PGA SAP allocates various memory pools for cache JVM allocates heap and uses garbage collection Some Java applications use one of the storage pool classes Implementing another layer of memory management Memory appears in-use for Linux memory management CMM will not be able to free up resources 44

Enterprise Applications Make one or more managers do nothing Configure buffers small enough that it does not swap Virtual machine sizing must match application configuration Accept z/vm paging for low-utilized servers Unfortunately most of these do not drop from queue Know your application to get this right Verify configuration with performance measurements Shared memory (Oracle SGA) resides in page cache Bug in Oracle makes it ignore the PGA target size JVM Garbage Collect actually touches the pages Very unpleasant when paged out by z/vm Do not install things applications that you do not need LX44 Wed 15:15 45

Linux on z/vm Memory Management is hard work Memory can be shared just like CPU can be shared Over commit of memory is what drives sharing Necessary to host low utilized servers in a cost effective way Virtual server sizing is the main tuning knob Virtual machine size and VDISK for Linux swap Ratio helps to prioritize the workload Work with application owners to get it right CMM-1 can provide some flexibility in adjustment Measure and monitor both z/vm and Linux metrics Can t make sense of one without the other Validate virtual server sizing Understand when workload grows beyond plan Summary 46

Visualization Techniques 47

Linux on z/vm Memory Management Big Thank You to our customers who let me work on their performance problems If you have performance problems, just drop me a note or catch me somewhere Rob van der Heij rvdheij @ velocitysoftware.com IBM System z Technical Conference Brussels, 2009 Session LX45 Velocity Software, Inc http://www.velocitysoftware.com/ Copyright 2009 Velocity Software, Inc. All Rights Reserved. Other products and company names mentioned herein may be trademarks of their respective owners.