STUDY OF PERFORMANCE COUNTERS AND PROFILING TOOLS TO MONITOR PERFORMANCE OF APPLICATION 1 DIPAK PATIL, 2 PRASHANT KHARAT, 3 ANIL KUMAR GUPTA 1,2 Depatment of Information Technology, Walchand College of Engineering, Sangli (MH), India. 3 Joint Director, CDAC, Pune (MH), India. Abstract- Monitoring the performance of the application is an important task to make them effective and efficient. That s why modern processor incorporated a new feature called PMU. It used for monitoring performance of application by maintaining performance counters. The various performance profiling tools are developed to monitor various events and extract there count from performance counter. This paper is to study some tool such as PerfCtr, Perf, PAPI, Intel EC SDK and there mechanism and usage. After that we will find the appropriate tool for profiling the counter from PMU. Keywords- PMU, PMC, Performance Event, MSR. I. INTRODUCTION Now a day s application development with high performance and low power consumption is a hot topic and to achieve this, tuning of application with low the level assessment of it is become important. Now it s possible by monitoring the performance event which occur during the execution of the application [1][2]. Modern Processor are generally comes with Performance Monitoring Unit (PMU) [3], which is supported in Intel architecture from Pentium processor. The PMU consist of performance counter to count event occurred in processor and system. To work with PMU model provided in processor there are some tools are developed such as perfctr, perf, Performance API (PAPI), Intel EC SDK. Perfctr is a open source tool comes in Linux package [4] having patch to the kernel and drivers to evaluate the performance parameters of an application. Perf [5] is another profiling tool for Linux-2.6 and onwards versions. It is a simple command line tool that gives access to the performance parameters of application, which is based on the perf_event interface[6] exported by the recent version of the Linux kernel. PAPI [7] is on tool using higher level API to set up and access performance counter and measure the performance event. Intel Energy Checker SDK (Intel EC SDK) is the tool developed by Intel with intention to develop energy efficient applications [8]. This paper s section II will study PMU architecture for Intel core micro-architecture, section III discuss tools internal working mechanism and IV section will see supported Intel x86 architectures by this tools. II. PMU ARCHITECTURE The PMU is part of processor, including Performance Monitoring counter (PMC) and some Model Specific Register (MSR) to configure the PMC. PMC is a counter, which holding count the occurrences of event. Here we will see Intel core micro-architecture for performance monitoring, consists of two general purpose counters and three fixed function counter [3]. Table I Performance Monitoring Counters General-Purpose Fixed-Function PMC PMC IA32_PMC0 IA32_PMC1 IA32_FIXED_CTR0 IA32_FIXED_CTR1 IA32_FIXED_CTR2 The PMU have following MSR to control and program or configure, to get status and handle overflow of PMCs. IA32_PERFEVTSELx Configuration of the General -Purpose PMC is done by writing to bit fields into their respective MSR. IA32_FIXED_CTR_CTRL Configuration of the fixed-function PMCs is done by writing to bit fields in this MSR. Most frequent operations in programming performance events are enabling or disabling event counting and checking the status of counter overflows, which is globally done by following MSR. IA32_PERF_GLOBAL_CTRL With this MSR can enable/disable event counting of all or any combination of fixed-function PMCs or any general-purpose PMCs. IA32_PERF_GLOBAL_STATUS This MSR allows to query counter overflow conditions on any combination of fixed-function PMCs or general-purpose PMCs. IA32_PERF_GLOBAL_OVF_CTRL This MSR allows software to clear counter overflow conditions on any combination of fixed-function PMCs or general-purpose PMCs. 45
III. WORKING OF TOOLS Study of Performance Counters and Profiling Tools to Monitor Performance of Application A. Perfctr Perfctr is an open source tool used for profiling an application by accessing performance counters. A Linux package Perfctr 2.x [4] consists of driver and a patch for kernel. Perfctr patch modifies the process to support per process counter which is used to profiling hardware counter. This tool uses driver which makes possible to program and read values from performance monitoring unit found in every modern processor. The mechanism used by perfctr is, every Linux process maintains its own set of Virtual PMCs, which are mapped with processor hardware PMCs. This Virtual PMCs are private to each process. Each process also has a virtual Time-Stamp Counter (TSC). The virtual PMCs are of 64 bit precision, where processors incorporate 40 or 48 bit PMCs. A process accesses its virtual PMCs by opening /proc/self/perfctr and issuing system calls on the resulting file descriptor. A user-space library given with the package provides a more high-level interface. The driver also supports global-mode or system-wide PMCs. In this mode, each PMC on each processor can be controlled and read. The PMCs and TSCs on active processors are sampled periodically and the accumulated sums have 64-bit precision. Global-mode PMCs access via the /dev/perfctr device file; the userspace library provides a more high-level interface. Perfctr package has following two parts first is patch to kernel and another is driver to access PMC. i. Patch to kernel: PMCs are general purpose registers which are part of PMU holding count of event when process are executing. One way to use this PMC registers to count event per process basis by making modifications in process structure. This patch modifies per process data structure and routines used for context switching, to support the PMCs in process and hold the value of counter. emulates a device /dev/perfctr to which users can issue ioctls to obtain values of various PMCs. It defines function mapping to ioctl in its file operation structure. List of ioctls that can be sent to this device and their corresponding functions are called. PERFCTR_INFO ioctl returns a structure which gives information on various counters to the user using copy_to_user function. GPERFCTR_CONTROL ioctl makes the driver to allocate various perfctr structures and start a timer which will be used to sample the values at periodic intervals. GPERFCTR_READ ioctl returns a perfctr structure with the updated values of the PMC s. GPERFCTR_STOP ioctls releases the timer and resets various PMC s to their previous values. B. Perf Perf is an open source command line tool provided by Linux, which is used for performance monitoring of applications. It is available from Linux kernel version 2.6.31 which is allows measure performance parameters of application with PMU. It is required to program the PMU for measuring performance parameters, and retrieve counters value and information. Linux interface named perf_event, is used for this purpose. Linux Perf has files core.c and Perf_event.c. These files provide an interface between the Linux kernel and user space performance monitoring tools shown in fig.1. Changes are done in process specific files to support per-process virtual performance counter and handle their context switch. At every process switch hardware context of the process is being replaced must be saved somewhere. Thus process descriptor is modified to save the hardware counter value in process context when it is switched out. The process handling routine is also modified to call the virtual per-process counter driver routines which save and restore the PMCs values. ii. Driver to provide PMC access: Perfctr driver provides /proc interface. By opening the file /proc/self/perfctr, processes can get perfctr structure containing PMC values. PMC driver Figure 1. Architecture of Perf i. Perf_event Perf_event is a Linux Subsystem. When the Linux kernel is loaded the corresponding perf modules are statically loaded. It assigns the file descriptor for each event and thread or process. It provides a file descriptor by using perf_event_open() [5] where we mention the event to which it should be assigned. It configure the hardware PMCs for events to be monitor. It returns file descriptor, which is used to access performance counter value. Perf_event has various features [9] such as providing generalized events available on most of the modern processors, 46
event scheduling, multiplexing to measure count of more events at than number of counter supported by the hardware. It also provides software events. int perf_event_open(struct perf_event_attr *attr,pid_t pid, int cpu, int group_fd,unsigned long flags) cpu field specify that cpu on which event monitor. pid field specify process of which event monitor. attr field is detail information of event to be monitor. group_fd field is useful for group of number of event. C. Intel EC SDK Intel EC SDK [8] is a tool for estimation of energy and power consumption by the platform when the application is executing on it. Intel build this tool with the goal of develop of an energy efficient application by the analysis of energy and power consumption of the application. This tool is beneficial for evaluating the impact of change of hardware, hardware setting and software algorithm and library on energy consumption of application. Each file descriptor corresponds to one event that is measured; these can be grouped together to measure multiple events simultaneously. Events can be enabled and disabled either via ioctl or via prctl. int read(int fd,char *buf,size_t size) By using a file descriptor on the read system call can access the counter value. Internally Perf_event calls various functions to interact with PMU module and read values from the hardware. When perf_event is loaded, it registers a Nonmaskable interrupt handler. It generates interrupt when counter overflow occurs. The interrupt handler saves the value of the registers and the counters are reset to predefined values. The perf_event subsystem invokes some functions of the Linux scheduler, To measuring per-thread or perprocess performance. At every context switch the context of current events has been pushed on the task_struct structure. Once the context switch is over the events attached to the newly scheduled process are accessed via the current macro in Linux which point to the currently running process. ii. Libpfm4 The perf command provides a subset of common performance counter events to measure such as processor clock cycles, instructions counts, and cache event metrics. However, most processors provide many other implementation specific hardware events such a floating point operations and micro architecture events (such as stalls due to hardware resource limits). To access those implementation specific events one needs to use the raw event in perf_event which can be tedious. Libpfm4 [10] provides a mapping mechanism to refer to those implementation specific hardware events by name. This library is used in conjunction with perf_events Linux API. Encoding event for perf_event done by int pfm_get_event_encoding(const char *str,int dfl_plm, struct perf_event_att *attr,char **fstar,int *idx) str is event string to encode dfl_plm is privilege level mask attr is perf_event specific event data structure fill out by this function. This is also used to count events by inserting API in application source code. It uses sequence of logical counters called it productivity link. Counter is one that stores the number of times event or process occurred. This tool allows import and export of counter to/from application. It imports and exports counter through productivity link (PL). Application uses PL to import/export counter with following steps. i. Create PL. ii. Specify counter to be created and maintain in the PL. iii. Use PL for Import and Export counter to/ from the application. iv. Close PL. Fig.2. Using Intel EC to import and export counters The component of Intel EC SDK as follows: i. Intel Core API ii. Interpretability tool iii. Energy and Temperature Monitoring tool iv. Scripting tool v. SDK companion application D. PAPI PAPI is very famous and commonly used tool for retrieving the performance of the application. This tool provide a consistent interface and methodology to use performance counter found in modern microprocessors. The Goal to develop PAPI [7] tool is to provide an easy to use, common set of interfaces that will gain access to these performance counters on all major processor platforms, thereby providing application developer information to performance analysis, modeling, and tuning of application. PAPI consist of two interface provided to hardware counter. The high level interface provide simple access to counter and make the counter start, stop, 47
reading simple for specified number of performance event. The low level interface manages hardware events in user defined groups called EventSets. PAPI includes a predefined set of events meant to represent a lowest common denominator of a good counter implementation, the intent being that the same tool would count similar and possibly comparable events when run on different platforms. If the programmer chooses to use this set of standardized events, then the source code need not be changed and only a recompile is necessary. In addition to provide access to counter this tool also handle condition of counter overflow. It provides the sophisticated functionality of user callbacks on counter overflow and hardware based SVR4 compatible profiling, regardless of whether or not the operating system supports it. Nehalem/ Yes Yes Yes Yes Nehalem EX / Westmer e Westmer Yes No Yes Yes e Ex Sandy Yes No Yes Yes Bridge Sandy Yes No Yes Yes BridgeEP Ivy Yes No Yes Yes Bridge Ivy Yes No Yes Yes BridgeEP Hashwell Yes No Yes Yes / Hashwell EP Brodwell No No Yes Yes Knight corner Yes No Yes Yes CONCLUSION Fig.3. Architecture of PAPI IV. SUPPORTED ARCHITECTURES The tools studied above have compatibility to some specific micro-architectures. Table II shows micro architecture from x86 architecture supported for these tools. Table II Supported x86 architectures Name PA Perfe Perf_e Libpfm4 PI ct vent Pentium No Yes No No Pentium Yes Yes Yes Yes Pro/II/III/ M/4/D Core Duo Yes Yes Yes Yes Core 2 Yes Yes Yes Yes Atom Yes Yes Yes Yes Atom Yes No Yes Yes Cedarvie w Atom Silvermo n Yes No Yes Yes Among these tools perfctr, perf, PAPI are open source tools, where as PAPI, perf are more user friendly, i.e. easy to set up, easy to use for profiling of Performance counter. Perfctr is limited to some lower kernel versions which are not mostly used in today s systems. The Intel EC SDK is not best suited for accessing performance counters as compared to the others. However, it is used to measure energy consumption of system with external power meters. Perf is a command line tool for accessing performance counters on Linux platform. It builds on the perf_event Linux interface upon which different performance monitoring tools were developed. PAPI is another best alternative to measure performance by directly accessing hardware counters. PAPI and perf tool work with large number of micro-architecture. We can mark that perf and PAPI are most appropriate tools to work with performance counter and performance profiling. REFERENCES [1] W. Lloyd Bircher and Lizy K. John, Complete System Power Estimation Using Processor Performance Events, IEEE TRANSACTIONS ON COMPUTERS, VOL. 61, NO. 4, APRIL 2012. [2] Rance Rodrigues, Arunachalam Annamalai, Israel Koren, and Sandip Kundu, A Study on the Use of Performance Counters to Estimate Power in Microprocessors, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II: EXPRESS BRIEFS, VOL. 60, NO. 12, DECEMBER 2013. [3] Intel 64 and IA-32 Architectures Software Developer s Manual Volume 3B: System Programming Guide, Part 2 http://www. developer.intel.com. [4] Source Code of perfctr patch by Mikael Pattersson http://www.csd.uu.se/~mikpe/linux/perfctr. 48
[5] Linux manual page for perf_event_open http://web.eece.maine.edu/~vweaver/projects/perf_events/pe rf_event_open.html [6] Aman Singh and Anup Buchke, A Study of Performance Monitoring Unit, perf and perf_events subsystem. [7] S. Browne, J Dongarra, N. Garner, K. London, P. Mucci, A Portable Programming Interface for Performance Evaluation on Modern Processors. [8] Intel Energy Checker SDK Code, Resource and Documentation https://software.intel.com/en-us/articles/intel-energychecker-sdk. [9] Vincent M. Weaver, Linux perf_event features and overhead. [10] Libpfm4 manual page http://perfmon2.sourceforge.net/docs_v4.html 49