Performance monitoring with Intel Architecture



Similar documents
CPU performance monitoring using the Time-Stamp Counter register

Intel Xeon Phi Coprocessor (codename: Knights Corner) Performance Monitoring Units

Using the RDTSC Instruction for Performance Monitoring

Intel 64 and IA-32 Architectures Software Developer s Manual

Intel Processor Serial Number

Intel Vanderpool Technology for IA-32 Processors (VT-x) Preliminary Specification

These are Not Your Grand Daddy's CPU Performance Counters

Programming Guide. Intel Microarchitecture Codename Nehalem Performance Monitoring Unit Programming Guide (Nehalem Core PMU)

Hardware Assisted Virtualization Intel Virtualization Technology

Intel Virtualization Technology Specification for the IA-32 Intel Architecture

KVM as a Microsoft-compatible hypervisor.

Intel Virtualization Technology FlexMigration Application Note

matasano Hardware Virtualization Rootkits Dino A. Dai Zovi

Operating Systems. Lecture 03. February 11, 2013

Intel Virtualization Technology and Extensions

Intel Virtualization Technology FlexMigration Application Note

Transparent ROP Detection using CPU Performance Counters. 他 山 之 石, 可 以 攻 玉 Stones from other hills may serve to polish jade

A Study on Performance Monitoring Counters in x86-architecture

Addendum Intel Architecture Software Developer s Manual

Intel 64 Architecture x2apic Specification

Chapter 02: Computer Organization. Lesson 04: Functional units and components in a computer organization Part 3 Bus Structures

Using PAPI for hardware performance monitoring on Linux systems

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

Hardware performance monitoring. Zoltán Majó

Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat

Hardware-based performance monitoring with VTune Performance Analyzer under Linux

A Study of Performance Monitoring Unit, perf and perf_events subsystem

A High Resolution Performance Monitoring Software on the Pentium

Computer-System Architecture

Multi-Threading Performance on Commodity Multi-Core Processors

Intel 64 and IA-32 Architectures Software Developer s Manual

STUDY OF PERFORMANCE COUNTERS AND PROFILING TOOLS TO MONITOR PERFORMANCE OF APPLICATION

Operating Systems. Virtual Memory

CHAPTER 6 TASK MANAGEMENT

Intel TSX (Transactional Synchronization Extensions) Mike Dai Wang and Mihai Burcea

AMD Processor Recognition Application Note. For Processors Prior to AMD Family 0Fh Processors

Application Note 195. ARM11 performance monitor unit. Document number: ARM DAI 195B Issued: 15th February, 2008 Copyright ARM Limited 2007

Intel Xeon Phi Coprocessor System Software Developers Guide

Synchronization. Todd C. Mowry CS 740 November 24, Topics. Locks Barriers

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

Performance Test Results Report for the Sled player

Intel Xeon Phi Coprocessor System Software Developers Guide. SKU# EN March, 2014

Lecture 3: Modern GPUs A Hardware Perspective Mohamed Zahran (aka Z) mzahran@cs.nyu.edu

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

Page Modification Logging for Virtual Machine Monitor White Paper

Intel 64 and IA-32 Architectures Software Developer s Manual

I/O. Input/Output. Types of devices. Interface. Computer hardware

THE BASICS OF PERFORMANCE- MONITORING HARDWARE

Intel Pentium M Processor Specification Update

Lesson-16: Real time clock DEVICES AND COMMUNICATION BUSES FOR DEVICES NETWORK

Computer Organization and Components

Intel s Virtualization Extensions (VT-x) So you want to build a hypervisor?

Managing and Monitoring Windows 7 Performance Lesson 8


Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z)

Windows Server Virtualization & The Windows Hypervisor. Background and Architecture Reference

Basics of VTune Performance Analyzer. Intel Software College. Objectives. VTune Performance Analyzer. Agenda

Intel 8086 architecture

ZCL for SONYGigECAM Introduction Manual

Abysssec Research. 1) Advisory information. 2) Vulnerable version

Jorix kernel: real-time scheduling

Computer Systems Structure Input/Output

Using Hardware Performance Events for Instruction-Level Monitoring on the x86 Architecture

8051 hardware summary

CSC 2405: Computer Systems II

Computer Architectures

Technical Note. Micron NAND Flash Controller via Xilinx Spartan -3 FPGA. Overview. TN-29-06: NAND Flash Controller on Spartan-3 Overview

AES Flow Interception : Key Snooping Method on Virtual Machine. - Exception Handling Attack for AES-NI -

User s Manual For Installation of Imposa Software. Version: V1.0

Virtualization Detection: New Strategies and Their Effectiveness

Learning Outcomes. Simple CPU Operation and Buses. Composition of a CPU. A simple CPU design

CS5460: Operating Systems. Lecture: Virtualization 2. Anton Burtsev March, 2013

Intel Core TM i7-660ue, i7-620le/ue, i7-610e, i5-520e, i3-330e and Intel Celeron Processor P4505, U3405 Series

Dualog Connection Suite Hardware and Software Requirements

CPU Organization and Assembly Language

Introduction. What is an Operating System?

Intel 64 and IA-32 Architectures Software Developer s Manual

CoE3DJ4 Digital Systems Design. Chapter 4: Timer operation

l C-Programming l A real computer language l Data Representation l Everything goes down to bits and bytes l Machine representation Language

The WIMP51: A Simple Processor and Visualization Tool to Introduce Undergraduates to Computer Organization

Virtualization Technologies

CPU Monitoring With DTS/PECI

Performance Tuning and Optimizing SQL Databases 2016

Computer Organization. and Instruction Execution. August 22

SYSTEM ecos Embedded Configurable Operating System

Storage. The text highlighted in green in these slides contain external hyperlinks. 1 / 14

Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:

A Hypervisor IPS based on Hardware assisted Virtualization Technology

The team that wrote this redbook Comments welcome Introduction p. 1 Three phases p. 1 Netfinity Performance Lab p. 2 IBM Center for Microsoft

Operating Systems 4 th Class

Detecting Hardware Keyloggers. Fabian Mihailowitsch November 26, 2010

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University

POSIX. RTOSes Part I. POSIX Versions. POSIX Versions (2)

Running FileMaker Pro 5.0v3 on Windows 2000 Terminal Services

Transcription:

Performance monitoring with Intel Architecture CSCE 351: Operating System Kernels Lecture 5.2 Why performance monitoring? Fine-tune software Book-keeping Locating bottlenecks Explore potential problems or hidden problems Why do you need to fine tune your software Performance, performance, performance Cache misses, page faults

A JVM problem Imagine Three major components Execution Memory management Synchronization How would you measure the execution time of each component? Or what about Multithreaded application Measure synchronization time Measure amount of time in contention Measure amount of time setting lock Measure bus contention How would you do it?

Time-stamp Counter Pentium defines a time-stamp counter mechanism used to monitor and identify the relative time of occurrence of processor events the architecture includes instruction RDTSC is used to read the counter (a 64 bit register) a feature bit (TCS flag) that can be read with the CPUID instruction, a time-stamp counter disable bit (TSD flag) in control register CR4 a model-specific time-stamp counter Time-stamp Counter (cont.) Following execution of the CPUID instruction the TSC flag in register EDX (bit 4) indicates (when set) that the time-stamp counter is present in a particular Intel Architecture processor implementation. The time-stamp counter is a 64-bit counter that is set to 0 following the hardware reset of the processor

Time-stamp Counter (cont.) the time-stamp counter will increment ~6.3 x 10 15 times per year at 200 MHz clock rate it would take over 2000 years for the counter to wrap around The RDTSC instruction loads the current count of the time-stamp counter into the EDX:EAX registers Time-stamp Counter (cont.) the RDTSC instruction can be executed by programs and procedures running at any privilege level The TSD flag in control register CR4 (bit 2) allows use of this instruction to be restricted to only programs and procedures running at privilege level 0.

Time-stamp Counter (cont.) Since instructions in Pentium Pro and later can finish out of order The RDTSC instruction is not serializing or ordered with other instructions. it does not necessarily wait until all previous instructions have been executed before reading the counter. subsequent instructions may begin execution before the RDTSC instruction operation is performed. Performance Counters the P6 family also provides two 40-bit performance counters count the number of events or measure duration When counting event a counter is incremented each time a specified event takes place

Performance counters (cont.) When measuring duration a counter counts the number of processor clocks that occur while a specified condition is true The counters can count events or measure durations that occur at any privilege level. Performance counters (cont.) There are four MSR (Model Specific Registers) that are used to support performance counters PerfEvtSel0 and PerfEvtSel1 are used to select the events (32-bit) PerfCtr0 and PerfCtr1 are used to count the number of occurrences (40-bit)

Performance counters (cont.) Performance counters (cont.) RDMSR and WRMSR instructions are used to read and write to these MSR at privilege 0 RDMSR: MSR -> EDX:EAX registers WRMSR: EDX:EAX -> MSR the PerfCtr0 and PerfCtr1 MSRs can be read from any privilege level using the RDPMC (read performance-monitoring counters) instruction.

Performance Counters (cont.) Starting and stopping the performance counters start by writing valid setup information in the PerfEvtSel0 and/or PerfEvtSel1 and setting the enable flag in the PerfEvtSel0. The counter begins after the execution of WRMSR Counter can be stop by clearing the enable counters flag or clearing all bit in PerfEvtSel0 and 1 Performance Counters (cont.) if (ECX == 0) { EDX:EAX = perfctr0; } else if (ECX == 1) { EDX:EAX = perfctr1; }

Performance Counters (cont.) XOR ECX, ECX RDPMC MOV [MEM1], EAX MOV [MEM2], EDX ADD EDX, EAX RDPMC Monitoring Software to use the counters, the OS needs to provide an event-monitoring device driver that provides feature checking (check CPUID return value) initializes and starts counters stops counters reads the the counters or read the time-stamp counters

Sample Events clocks(79h) branch misprediction (C5H) data cache reference (43H) instruction fetch (80H)