How to Identify and Troubleshoot Memory Leaks on IPSO



Similar documents
How To Backup a SmartCenter

CA Nimsoft Monitor. Probe Guide for Active Directory Server. ad_server v1.4 series

Remote Access Clients for Windows

Endpoint Security VPN for Mac

CA Workload Automation Agent for Microsoft SQL Server

Endpoint Security VPN for Mac

PMOD Installation on Linux Systems

CA Nimsoft Monitor. Probe Guide for CPU, Disk and Memory. cdm v4.7 series

HP OpenView Patch Manager Using Radia

SSL Network Extender R71. Release Notes

TotalShredder USB. User s Guide

Security Gateway Virtual Appliance R75.40

Objectives. Chapter 2: Operating-System Structures. Operating System Services (Cont.) Operating System Services. Operating System Services (Cont.

Operating System Components

CS197U: A Hands on Introduction to Unix

System Resources. To keep your system in optimum shape, you need to be CHAPTER 16. System-Monitoring Tools IN THIS CHAPTER. Console-Based Monitoring

Monitor NetSight Server Health

PERFORMANCE TUNING IN MICROSOFT SQL SERVER DBMS

This presentation explains how to monitor memory consumption of DataStage processes during run time.

CA Unified Infrastructure Management

IBM Tivoli Monitoring Version 6.3 Fix Pack 2. Infrastructure Management Dashboards for Servers Reference

Veeam ONE What s New in v9?

New Features Guide. Adaptive Server Enterprise 15.7 SP50

Cincom Smalltalk. Installation Guide P SIMPLIFICATION THROUGH INNOVATION

CA Nimsoft Monitor. Probe Guide for Sharepoint. sharepoint v1.6 series

BMC Impact Solutions Infrastructure Management Guide

Operating System Structures

CA ARCserve Backup for Windows

Outline: Operating Systems

CA ARCserve Backup for Windows

ATTO ExpressSAS Troubleshooting Guide for Windows

SAP Sybase Adaptive Server Enterprise Shrinking a Database for Storage Optimization 2013

Monitoring IBM HMC Server. eg Enterprise v6

CA arcserve Unified Data Protection Agent for Linux

Security Gateway for OpenStack

How do Users and Processes interact with the Operating System? Services for Processes. OS Structure with Services. Services for the OS Itself

Archiving File Data with Snap Enterprise Data Replicator (Snap EDR): Technical Overview

Red Hat Linux Internals

Linux Process Scheduling Policy

Monitoring DoubleTake Availability

Matisse Server Administration Guide

CA Unified Infrastructure Management

Web Application s Performance Testing

CS3600 SYSTEMS AND NETWORKS

Extreme Networks Security Upgrade Guide

HP Service Manager Shared Memory Guide

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

DevTest Solutions. Local License Server. Version 2.1.2

CA Performance Center

Directions for VMware Ready Testing for Application Software

R75. Installation and Upgrade Guide

CA Nimsoft Monitor. Probe Guide for Apache HTTP Server Monitoring. apache v1.5 series

ORACLE USER PRODUCTIVITY KIT USAGE TRACKING ADMINISTRATION & REPORTING RELEASE 3.6 PART NO. E

Chapter 12 File Management. Roadmap

Chapter 12 File Management

Server Manager Performance Monitor. Server Manager Diagnostics Page. . Information. . Audit Success. . Audit Failure

Performance of VMware vcenter (VC) Operations in a ROBO Environment TECHNICAL WHITE PAPER

Introduction to Windows Server 2016 Nested Virtualization

Troubleshooting PHP Issues with Zend Server Code Tracing

File System Management

McAfee Web Gateway 7.4.1

Security Gateway R75. for Amazon VPC. Getting Started Guide

CSC 2405: Computer Systems II

Password Changer for DOS User Guide

CA Nimsoft Monitor. Probe Guide for Internet Control Message Protocol Ping. icmp v1.1 series

Operating System and Process Monitoring Tools

Endpoint Security VPN for Windows 32-bit/64-bit

CA Workload Automation Agent for Databases

PHD Virtual Backup for Hyper-V

VEEAM ONE 8 RELEASE NOTES

Symantec Endpoint Protection Shared Insight Cache User Guide

MySQL and Virtualization Guide

Oracle Cloud. What s New for Oracle Compute Cloud Service (IaaS) Topics. July What's New for Oracle Compute Cloud Service (IaaS) Release 16.

This presentation will discuss how to troubleshoot different types of project creation issues with Information Server DataStage version 8.

Chapter 8 Monitoring and Logging

Leak Check Version 2.1 for Linux TM

Using Secure4Audit in an IRIX 6.5 Environment

Increased operational efficiency by providing customers the ability to: Use staff resources more efficiently by reducing troubleshooting time.

Performance Monitoring User s Manual

AKIPS Network Monitor Installation, Configuration & Upgrade Guide Version 15. AKIPS Pty Ltd

Protecting Virtual Servers with Acronis True Image

CA Nimsoft Monitor. Probe Guide for IIS Server Monitoring. iis v1.5 series

Chapter 3: Operating-System Structures. System Components Operating System Services System Calls System Programs System Structure Virtual Machines

Cisco Setting Up PIX Syslog

Performance Optimization Guide

Chapter 3 Operating-System Structures

Splunk for VMware Virtualization. Marco Bizzantino Vmug - 05/10/2011

JP1/IT Desktop Management 2 - Agent (For UNIX Systems)

Trend Micro Incorporated reserves the right to make changes to this document and to the products described herein without notice.

Introduction Connecting Via FTP Where do I upload my website? What to call your home page? Troubleshooting FTP...

Managing Microsoft Hyper-V Server 2008 R2 with HP Insight Management

Symantec NetBackup Vault Operator's Guide

License Administration Guide. FlexNet Publisher 2014 R1 ( )

Endpoint Security Client

Transcription:

How to Identify and Troubleshoot Memory Leaks on IPSO 5 July 2010

2010 Check Point Software Technologies Ltd. All rights reserved. This product and related documentation are protected by copyright and distributed under licensing restricting their use, copying, distribution, and decompilation. No part of this product or related documentation may be reproduced in any form or by any means without prior written authorization of Check Point. While every precaution has been taken in the preparation of this book, Check Point assumes no responsibility for errors or omissions. This publication and features described herein are subject to change without notice. RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the government is subject to restrictions as set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013 and FAR 52.227-19. TRADEMARKS: See the Copyright page (http://www.checkpoint.com/copyright.html) for a list of our trademarks. See the Third Party copyright notices (http://www.checkpoint.com/3rd_party_copyright.html) for a list of relevant copyrights.

Important Information Latest Version The latest version of this document is at: http://supportcontent.checkpoint.com/documentation_download?id=10906 For additional technical information visit Check Point Support Center (http://supportcenter.checkpoint.com). Revision History Date Description 5 July 2010 Initial version Feedback Check Point is engaged in a continuous effort to improve its documentation. Please help us by sending your comments (mailto:cp_techpub_feedback@checkpoint.com?subject=feedback on How to Identify and Troubleshoot Memory Leaks on IPSO ).

Contents Important Information... 3 Objective... 5 Supported Versions... 5 Supported OS... 5 Supported Appliances... 5 Before You Start... 6 Related Documentation and Assumed Knowledge... 6 Impact on the Environment and Warnings... 6 Background on Unix Memory management... 7 How to Use This Document... 9 The Memory Leak Detection Script... 9 Running the Memory Leak Detection Script... 10 Analysis of the Memory Leak Detection script... 14 Additional Tools... 15

Related Documentation and Assumed Knowledge Objective This document explains how to identify and troubleshoot memory leaks on IPSO systems. Supported Versions This document was developed for Check Point R70. However, it applies to all versions of the Check Point firewall. Supported OS IPSO 6.2 MR1 (GA029a02) IPSO 6.2 MR2 (GA039) The same concepts directly translate to any POSIX-compliant system Supported Appliances All IP Series appliances running IPSO 6.2 and Check Point firewall. Objective Page 5

Related Documentation and Assumed Knowledge Before You Start Related Documentation and Assumed Knowledge A fundamental knowledge of Unix memory management is recommended. Check Point Technical Assistance Center (TAC) assistance is not required to interpret the output of the ps, top, or vmstat commands. These are POSIX compliant commands present on all Unix systems, therefore the customer administrator is expected to have sufficient knowledge to run these tools and understand the output. This base knowledge is not strictly required to perform the steps outlined in this document, however the results will be confusing to people without the base level of knowledge. Therefore the actual analysis of the generated data should be performed by Check Point support. Note - The information in this document will not help resolve the issue, only help identify or prove that an issue exists. Impact on the Environment and Warnings Important - This document explains how to trace memory leaks, but in ANY procedure to trace memory leaks, there will be a significant impact on CPU resources. None of the steps outlined should be performed during core business hours on mission-critical high performance systems. Before You Start Page 6

Impact on the Environment and Warnings Background on Unix Memory management Zones The IPSO kernel manages several pools of memory that can be allocated to either kernel functions or processes. The memory pools can be viewed with vmstat z, which displays all of the Universal Memory Allocator zones. Each zone is constructed with a distinctive name, and statistics about the usage: Zone references by slab children, Memory usage of the zone, Number of total requests in the zone. slabs There are children memory pools which are directly connected the zones, either by the kernel or running processes. They are called slabs. Every slab is a child of a UMA memory zone; the zone sets certain size restrictions on the type of memory pages that may be requested. The list of memory slabs may be viewed with vmstat m. The reason to use memory slabs, is simply for efficiency. It is a much faster operation to create a slab of memory as a child of a pre-initialized UMA zone, than to create a memory zone from scratch. The UMA memory zones are children of a "primitive" structure called a Keg (outside the scope of this document, see references). This hierarchy of memory allocation is relevant because it can help you track down when a kernel virtual memory zone s used pages count is increasing and never going down. This is the definition of a memory leak within the kernel. Memory Pages Memory within the children slabs or UMA zones is allocated on a page basis by the virtual memory manager. The size of these memory pages is fixed within the system, usually at 4kb. The IPSO memory allocator works by maintaining a set of lists that are ordered by increasing powers of two. Each list contains a set of memory blocks of its corresponding size. To fulfill a memory request, the size of the request is rounded up to the next power of two. A piece of memory is then removed from the list corresponding to the specified power of two and returned to the requester. Thus, a request for a block of memory of size 53 returns a block from the 64-sized list. This sizing less than 4kb occurs within the 4kb slices that are allocated to a page; for example, consider that a memory operation is happening within a particular slab, and the requesting function needs 3 slots of memory, two for 512 bytes, and one for 2048 bytes. Since all of this can fit within a single virtual memory (VM) page, a single page could be used to store the data. The single VM page is allocated to a single requesting memory slab. Allocation Chain It is a critical requirement that memory allocation occur quickly. In IPSO, there are 3 structures that are defined which manage individual memory pages. Chaining together the structures allows a program to use predefined memory types, sizes and meta-information. This speeds up the allocation substantially and increases performance. 1. Keg Provides structural information about the memory allocation 2. UMA Zone Inherits information from a certain Keg structure, and further defines sizing and key attributes 3. Memory slab Inherits information from one or more UMA zones, and is linked to directly by the requesting task Virtual Memory Any modern operating system uses a Virtual Memory system to allocate memory space to the kernel and applications. This virtual system maps memory requests to contiguous address space, which maps to discontiguous physical page locations. The virtual memory handler is used to prevent resource exhaustion of physical memory. By tracking memory accesses and allocation/free operations, it is able to dynamically free up physical memory as needed by remapping a physical memory page to the disk media this is called Background on Unix Memory management Page 7

Impact on the Environment and Warnings swapping. Swapping of memory is normal and happens as part of memory management. However during standard operating conditions, a Check Point firewall should NOT swap as this would lead to severely degraded performance. Wired Memory A portion of the physical memory can be reserved as WIRED memory. Wired memory refers to memory pages that may not be swapped out of physical memory to disk. Often the main usage of Wired memory is for core kernel memory structures. Programs running on the system may request a certain percentage of their memory space to be reserved as Wired memory. The total memory allocations and usage may be viewed using the "top" command. Background on Unix Memory management Page 8

The Memory Leak Detection Script How to Use This Document Most often this document will be used because there is a suspected memory leak on a system. The reason for assuming there is a memory leak, is typically that the system crashed. A system could crash because there was insufficient Free memory, and no other type of memory could be Free d (possibly too much was Wired, or the Virtual Memory system was exhausted), and a new memory allocation for a core system function failed. In this instance it is normal for the system to crash or lock up. After the system has been recovered with a hard power cycle, there are very few clues about what originally caused the problem. The logfiles most often could not be written to, because there was no memory available to initialize the function that would write the log. All crucial IPSO kernel counters available via ipsctl would be cleared on a system reboot. Finally, the "top" "vmstat" and "ps" output would also be cleared. In the case where there is no available core file to analyze, an analysis must be done on the newly-booted system to determine if the problem is persistent and likely to occur again. Memory leaks are notoriously difficult to track down. The Memory Leak Detection Script The script accompanying this document is intended to help trace memory leaks and determine if there is indeed a memory leak present, so that the Check Point development team may be engaged to try and help narrow down the exact cause. The script by itself will only serve as instrumentation. If the script is aborted or the system crashes before the script is done, the raw stats will still be present but the script must be hacked (by removing the data collection portion and all sections above it) and rerun to generate the final output. The script is included in a package that also includes a sample output showing the script running on IPSO 6.2 MR2 (GA039). How to Use This Document Page 9

The Memory Leak Detection Script Running the Memory Leak Detection Script 1. Download the memory leak detection script mem-html.sh (http://supportcontent.checkpoint.com/file_download?id=10918), and copy the script to the target system The script is included in a package that also includes a sample output showing the script running on IPSO 6.2 MR2 (GA039). 2. You may wish to read the script before executing it. Important - The script is VERY CPU INTENSIVE and may conceivably cause traffic loss on production systems 3. For typical operations it is recommended to run the script with the following syntax: sh mem-html.sh 3600 This will execute the script, and run for approximately one hour 3600 seconds. If you are concerned about the CPU utilization of the script, you may run it with the following syntax: nice +20 sh mem-html.sh 3600 The script will be executed with the lowest possible CPU priority on the system; however the results may be less accurate. The script works by writing an HTML page and several subdirectories containing raw statistics. The raw stats are compiled and written in an abbreviated format to the HTML page as graphs and tables. Script output, collected before the intensive data collection The script will collect several important data and include them at the top of the document. Running the Memory Leak Detection Script Page 10

The Memory Leak Detection Script The basic information which is collected includes the duration of the script run, the kernel version, uptime, and date when the script was executed. The "top" output is also collected for later reference. The vmstat outputs listed are collected before, and after, the script data collection. Intensive data collection Until the end timer is reached, the script will iterate through data collection using ps and top. This output is collected in the appropriate subdirectory. Wrapping up the data Once the script s main loop is complete, the data is graphed within the HTML page using CSS. This is to assist the analyst by providing a historical plot of the memory utilization in megabytes. Running the Memory Leak Detection Script Page 11

The Memory Leak Detection Script Tables are also created containing the start and end values of both kernel memory stats, and process memory stats. This allows the tracking of memory leaks both within the kernel space, as well as userland. A leak in the kernel should be easily detectable by seeing the Free memory consistently shrink, and Wired memory growing consistently. There should be a corresponding growing Virtual Memory slab corresponding to the system function with the leak. Finally, the RSS for every process would drop as the virtual memory system tries to reclaim the least frequently used memory pages from the running processes. A leak in a userland process would show similar behavior, except there would be a large increase in RSS and VSZ on one specific process. The system *should* normally panic in these circumstances either when the process reserved Wired memory increases beyond the available Free space, or the Virtual Memory address space is exhausted. It has been observed that a system may simply hang rather than panic and generate a core file. As a convenience to the analyst, the file descriptor allocations are also tracked via the script, since file descriptor leaks could also be interpreted as a memory leak. All of the raw data that is used by the script is stored in the subdirectories so that the analyst may drop the results into Excel and plot them. Script output The resulting index.html and subdirectories are put into a script-output.tgz tarball, and the original files and directories are then deleted. The script-output.tgz must be collected and provided to Check Point TAC for analysis. Check Point will extract the contents of the tarball. Running the Memory Leak Detection Script Page 12

The Memory Leak Detection Script Running the Memory Leak Detection Script Page 13

The Memory Leak Detection Script Analysis of the Memory Leak Detection script The analyst of the script output must consider several items: How busy is the system? The system may only leak memory when there is a lot of traffic Running the script during a maintenance window may not yield the desired output How long it has been up? A system could leak memory over very long timeframes potentially weeks or months Has the memory gone up or down normally, or abnormally? A system will during a normal day request and release large amounts of memory "Abnormal" growth could indicate the problem, but it must be considered with everything else that is happening Where is the memory going? Depending on the uptime and how busy the system is, memory can be allocated to Wired, Inactive and process memory space in a way that looks wrong This is dependent on too many factors to worry about any one indicator Has the condition that triggered the memory leak occurred? A memory leak is usually triggered by a specific function that requests memory and never releases it. If this function doesn t execute, memory will not leak. It s important to run the script on the actual system that is leaking memory, or a proven replication When running the script, ensure that an appropriate duration is selected. Unless the memory leak is extreme, running the script for hours, days or weeks may be required. Analysis of the Memory Leak Detection script Page 14

The Memory Leak Detection Script Additional Tools Commands The following commands may prove useful when trying to track down memory leaks: top ps auxwwlshm vmstat z vmstat m vmstat 1 5 vmstat -s top mio S H Kernel Flags IPSO is compiled with an additional debug kernel. Since IPSO is FreeBSD-based, you can look up additional information about what the kernel flags are for. The additional flags that the kernel is compiled with, are Witness Witness_KDB Witness_Skipspin Invariants Invariants_Support Diagnostic KDB_Unattended Choosing to run this kernel.debug_g is extremely unwise unless directed to do so by Check Point engineering. One purpose of this debug kernel is to crash/panic when an error is detected, much more frequently than on a non-debug kernel. Therefore the kernel is very rarely provided to a customer and would rather be implemented in a lab where the error occurred. References The following resources are good starting places to learn about memory management: MIT 6.823 Computer System Architecture Lectures Fall 2009 (http://csg.csail.mit.edu/6.823/lecnotes.html) A Study of Initialization in Linux and OpenBSD", Operating System Review Vol 39, No 2, April 2005 (http://cisr.nps.edu/downloads/05paper_linux.pdf) The Slab Allocator: An Object-Caching Kernel Memory Allocator (http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=30f80dbda2dbc44d24f2bff25233ad40?doi=10. 1.1.29.4759&rep=rep1&type=pdf) Introduction to Debugging the FreeBSD Kernel (http://www.bsdcan.org/2008/schedule/attachments/45_article.pdf) Week 3 of Berkeley CS61C Spring 2010 Lectures (http://inst.eecs.berkeley.edu/~cs61c/sp10/) cse522: Advanced Operating Systems Lecture Notes, Spring 2008, Washington State University in St Louis (http://www.arl.wustl.edu/~fredk/courses/cs523/notes.html) Computer Science 152: Computer Architecture and Engineering Lectures, Berkeley University (http://inst.eecs.berkeley.edu/~cs152/sp10/) Network Buffer Allocation in the FreeBSD Operating System, Bosko Milekic (http://bmilekic.unixdaemons.com/netbuf_bmilekic.pdf) Understanding the Linux Virtual Memory Manager, Mel Gorman (http://ptgmedia.pearsoncmg.com/images/0131453483/downloads/gorman_book.pdf) Additional Tools Page 15