Block I/O Layer Tracing: blktrace



Similar documents
blktrace User Guide blktrace: Jens Axboe User Guide: Alan D. Brunelle 18 February 2007

Linux Performance Optimizations for Big Data Environments

Chip Coldwell Senior Software Engineer, Red Hat

Request-based Device-mapper multipath and Dynamic load balancing

Xen and XenServer Storage Performance

Storage Management. in a Hybrid SSD/HDD File system

Configuration Maximums VMware Infrastructure 3

Extreme Linux Performance Monitoring Part II

Ready Time Observations

Windows8 Internals, Sixth Edition, Part 1

Using Multipathing Technology to Achieve a High Availability Solution

Linux Kernel Architecture

Using Process Monitor

Chapter 6, The Operating System Machine Level

Windows NT. Chapter 11 Case Study 2: Windows Windows 2000 (2) Windows 2000 (1) Different versions of Windows 2000

PERFORMANCE TUNING ORACLE RAC ON LINUX

Bottleneck Detection in Parallel File Systems with Trace-Based Performance Monitoring

Datacenter Operating Systems

CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study

HPSA Agent Characterization

Performance Characteristics of VMFS and RDM VMware ESX Server 3.0.1

Example of Standard API

OS Thread Monitoring for DB2 Server

ELEC 377. Operating Systems. Week 1 Class 3

Chapter 5 Cloud Resource Virtualization

Chapter 11 I/O Management and Disk Scheduling

Best Practices on monitoring Solaris Global/Local Zones using IBM Tivoli Monitoring

Performance and scalability of a large OLTP workload

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer.

dnstap: high speed DNS logging without packet capture Robert Edmonds Farsight Security, Inc.

Virtual Private Systems for FreeBSD

MySQL and Virtualization Guide

- 1 - intelligence. showing the layout, and products moving around on the screen during simulation

Microsoft Windows Server 2003 with Internet Information Services (IIS) 6.0 vs. Linux Competitive Web Server Performance Comparison

WHITE PAPER FUJITSU PRIMERGY SERVER BASICS OF DISK I/O PERFORMANCE

Mark Bennett. Search and the Virtual Machine

Linux flash file systems JFFS2 vs UBIFS

Introduction to the NI Real-Time Hypervisor

Page 1 of 5. IS 335: Information Technology in Business Lecture Outline Operating Systems

Deep Dive: Maximizing EC2 & EBS Performance

Linux Block I/O Scheduling. Aaron Carroll December 22, 2007

Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors

Review from last time. CS 537 Lecture 3 OS Structure. OS structure. What you should learn from this lecture

An Implementation Of Multiprocessor Linux

theguard! ApplicationManager System Windows Data Collector

technical brief Multiple Print Queues

WHITE PAPER Optimizing Virtual Platform Disk Performance

SAN Conceptual and Design Basics

HP-UX System and Network Administration for Experienced UNIX System Administrators Course Summary

Violin: A Framework for Extensible Block-level Storage

XTM Web 2.0 Enterprise Architecture Hardware Implementation Guidelines. A.Zydroń 18 April Page 1 of 12

Taking Linux File and Storage Systems into the Future. Ric Wheeler Director Kernel File and Storage Team Red Hat, Incorporated

PARALLELS CLOUD STORAGE

MailEnable Scalability White Paper Version 1.2

Advanced Performance Forensics

Q & A From Hitachi Data Systems WebTech Presentation:

Transparent Optimization of Grid Server Selection with Real-Time Passive Network Measurements. Marcia Zangrilli and Bruce Lowekamp

Multiprogramming. IT 3123 Hardware and Software Concepts. Program Dispatching. Multiprogramming. Program Dispatching. Program Dispatching

Chapter 3 Operating-System Structures

white paper Capacity and Scaling of Microsoft Terminal Server on the Unisys ES7000/600 Unisys Systems & Technology Modeling and Measurement

CHAPTER 15: Operating Systems: An Overview

Adaptable System Recovery (ASR) for Linux. How to Restore Backups onto Hardware that May not be Identical to the Original System

Performance Comparison of Fujitsu PRIMERGY and PRIMEPOWER Servers

Virtualizare sub Linux: avantaje si pericole. Dragos Manac

I/O Management. General Computer Architecture. Goals for I/O. Levels of I/O. Naming. I/O Management. COMP755 Advanced Operating Systems 1

Improving Scalability for Citrix Presentation Server

Improved metrics collection and correlation for the CERN cloud storage test framework

Oak Ridge National Laboratory Computing and Computational Sciences Directorate. Lustre Crash Dumps And Log Files

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Benchmarking Hadoop & HBase on Violin

BROCADE PERFORMANCE MANAGEMENT SOLUTIONS

IBM. Kernel Virtual Machine (KVM) Best practices for KVM

Condusiv s V-locity Server Boosts Performance of SQL Server 2012 by 55%

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

Seradex White Paper. Focus on these points for optimizing the performance of a Seradex ERP SQL database:

HP ProLiant DL580 Gen8 and HP LE PCIe Workload WHITE PAPER Accelerator 90TB Microsoft SQL Server Data Warehouse Fast Track Reference Architecture

Chapter 2 System Structures

KVM Architecture Overview

HARFORD COMMUNITY COLLEGE 401 Thomas Run Road Bel Air, MD Course Outline CIS INTRODUCTION TO UNIX

How to analyse your system to optimise performance and throughput in IIBv9

The Xen of Virtualization

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Cloud Computing. Chapter 8 Virtualization

W H I T E P A P E R. Performance and Scalability of Microsoft SQL Server on VMware vsphere 4

An On-line Backup Function for a Clustered NAS System (X-NAS)

A Packet Forwarding Method for the ISCSI Virtualization Switch

Process Description and Control william stallings, maurizio pizzonia - sistemi operativi

IOMMU: A Detailed view

Eloquence Training What s new in Eloquence B.08.00

Lecture 25 Symbian OS

Hadoop. History and Introduction. Explained By Vaibhav Agarwal

Facultat d'informàtica de Barcelona Univ. Politècnica de Catalunya. Administració de Sistemes Operatius. System monitoring

Transcription:

Block I/O Layer Tracing: blktrace Gelato Cupertino, CA April 2006 Alan D. Brunelle Hewlett Packard Company Open Source and Linux Organization Scalability & Performance Group Alan.Brunelle@hp.com 1

Introduction Blktrace overview of a new Linux capability Ability to see what's going on inside the block I/O layer You can't count what you can't measure Kernel implementation Description of user applications Sample Output & Analysis 2

Problem Statement Need to know the specific operations performed upon each I/O submitted into the block I/O layer Who? Kernel developers in the I/O path: Block I/O layer, I/O scheduler, software RAID, file system,... Performance analysis engineers HP OSLO S&P... 3

Block I/O Layer (simplified) Applications File Systems... Page Cache Block I/O Layer: Request Queues Pseudo devices (MD/DM optional) Physical devices 4

iostat The iostat utility does provide information pertaining to request queues associated with specifics devices Average I/O time on queue, number of merges, number of blocks read/written,... However, it does not provide detailed information on a per I/O basis 5

Blktrace to the rescue! Developed and maintained by Jens Axboe (block I/O layer maintainer) My additions included adding threads & utility splitting, DM remap events, blkrawverify utility, binary dump feature, testing, kernel/utility patches, and documentation. Low overhead, configurable kernel component which emits events for specific operations performed on each I/O entering the block I/O layer Set of tools which extract and format these events However, blktrace is not an analysis tool! 6

Feature List Provides detailed block layer information concerning individual I/Os Low overhead kernel tracing mechanism Seeing less than 2% hits to application performance in relatively stressful I/O situations Configurable: Specify 1 or more physical or logical devices (including MD and DM (LVM2)) User selectable events can specify filter at event acquisition and/or when formatting output Supports both live and playback tracing 7

Events Captured Request queue entry allocated Sleep during request queue allocation Request issued to underlying block dev Request queue plug/unplug op Request queue insertion I/O split/bounce operation Front/back merge of I/O on request queue Re queue of a request I/O remap MD or DM Request completed 8

blktrace: General Architecture I/O I/O I/O... Block I/O Layer (request queue) Events Emitted Per Dev Relay Channel blktrace blkparse Relay Channel Relay Channel Per CPU Kernel Space User Space 9

blktrace Utilities blktrace: Device configuration, and event extraction utility Store events in (long) term storage Or, pipe to blkparse utility for live tracing Also: networking feature to remote events for parsing on another machine blkparse: Event formatting utility Supports textual or binary dump output 10

blktrace: Event Output Dev <mjr, mnr> CPU Process % blktrace -d /dev/sda -o - blkparse -i - 8,0 3 1 0.000000000 697 G W 223490 + 8 [kjournald] 8,0 3 2 0.000001829 697 P R [kjournald] 8,0 3 3 0.000002197 697 Q W 223490 + 8 [kjournald] 8,0 3 4 0.000005533 697 M W 223498 + 8 [kjournald] 8,0 3 5 0.000008607 697 M W 223506 + 8 [kjournald]... 8,0 3 10 0.000024062 697 D W 223490 + 56 [kjournald] 8,0 1 11 0.009507758 0 C W 223490 + 56 [0] Sequence Number Time Stamp PID Event Start block + number of blocks 11

Per CPU details Per device details blktrace: Summary Output CPU0 (sdao): Reads Queued: 0, 0KiB Writes Queued: 77,382, 5,865MiB Read Dispatches: 0, 0KiB Write Dispatches: 7,329, 3,020MiB Reads Requeued: 0 Writes Requeued: 6 Reads Completed: 0, 0KiB Writes Completed: 0, 0KiB Read Merges: 0 Write Merges: 68,844 Read depth: 2 Write depth: 65 IO unplugs: 414 Timer unplugs: 414... CPU3 (sdao): Reads Queued: 105, 18KiB Writes Queued: 14,541, 2,578MiB Read Dispatches: 22, 60KiB Write Dispatches: 6,207, 1,964MiB Reads Requeued: 0 Writes Requeued: 1,408 Reads Completed: 22, 60KiB Writes Completed: 12,300, 5,059MiB Read Merges: 83 Write Merges: 10,968 Read depth: 2 Write depth: 65 IO unplugs: 287 Timer unplugs: 287 Total (sdao): Reads Queued: 105, 18KiB Writes Queued: 92,546, 8,579MiB Read Dispatches: 22, 60KiB Write Dispatches: 13,714, 5,059MiB Reads Requeued: 0 Writes Requeued: 1,414 Reads Completed: 22, 60KiB Writes Completed: 12,300, 5,059MiB Read Merges: 83 Write Merges: 80,246 IO unplugs: 718 Timer unplugs: 718 Throughput (R/W): 0KiB/s / 39,806KiB/s Events (sdao): 324,011 entries Skips: 0 forward (0-0.0%) Avg throughput Writes submitted on Writes completed on 12

blktrace: Event Storage Choices Physical disk backed file system Pros: large/permanent amount of storage available; supported by all kernels Cons: potentially higher system impact; may negatively impact devices being watched (if storing on the same bus that other devices are being watched on...) RAM disk backed file system Pros: predictable system impact (RAM allocated at boot); less impact to I/O subsystem Cons: limited/temporary storage size; removes RAM from system (even when not tracing); may require reboot/kernel build TMPFS Pros: less impact to I/O subsystem; included in most kernels; only utilizes system RAM while events are stored Cons: limited/temporary storage; impacts system predictability RAM removed as events are stored could affect application being watched 13

blktrace: Analysis Aid As noted previously, blktrace does not analyze the data; it is responsible for storing and formatting events Need to develop post processing analysis tools Can work on formatted output or binary data stored by blktrace itself Example: btt block trace timeline 14

Practical blktrace Here at HP OSLO S&P, we are investigating I/O scalability at various levels Including the efficiency of various hardware configurations and the effects on I/O performance caused by software RAID (MD and DM) blktrace enables us to determine scalability issues within the block I/O layer and the overhead costs induced when utilizing software RAID 15

Life of an I/O (simplified) I/O enters block layer it can be: Remapped onto another device (MD, DM) Split into 2 separate I/Os (alignment, size,...) Added to the request queue Merged with a previous entry on the queue All I/Os end up on a request queue at some point At some later time, the I/O is issued to a device driver, and submitted to a device Later, the I/O is completed by the device, and its driver 16

btt: Life of an I/O Q2I time it takes to process an I/O prior to it being inserted or merged onto a request queue Includes split, and remap time I2D time the I/O is idle on the request queue D2C time the I/O is active in the driver and on the device Q2I + I2D + D2C = Q2C Q2C: Total processing time of the I/O 17

Software time Merge Ratio: #Q / #D SCSI bus, target: low to high priority btt: Partial Output DEV #Q #D Ratio BLKmin BLKavg BLKmax Total ------- --- ----- ----- ------- ------ ------ -------- [ 8, 0] 92827 12401 7.5 1 109 1024 10120441 [ 8, 1] 93390 13676 6.8 1 108 1024 10150343 [ 8, 2] 92366 13052 7.1 1 109 1024 10119302 [ 8, 3] 92278 13616 6.8 1 109 1024 10119043 [ 8, 4] 92651 13736 6.7 1 109 1024 10119903 DEV Q2I I2D D2C Q2C ------- ----------- ----------- ----------- ----------- [ 8, 0] 0.049697430 0.302734498 0.074038617 0.400079555 [ 8, 1] 0.031665593 0.050032148 0.058669682 0.125934697 [ 8, 2] 0.035651772 0.031035436 0.047311659 0.096735504 [ 8, 3] 0.021047776 0.011161007 0.038519804 0.060975408 [ 8, 4] 0.028985217 0.008397228 0.034344640 0.058160497 Avg I/O time DEV Q2I I2D D2C ------- ------ ------ ------ [ 8, 0] 11.7% 71.0% 17.4% [ 8, 1] 22.6% 35.6% 41.8% [ 8, 2] 31.3% 27.2% 41.5% [ 8, 3] 29.8% 15.8% 54.5% [ 8, 4] 40.4% 11.7% 47.9% Driver/Device time Excessive idle time on request queue 18

btt: Q&C Activity btt also generates activity data indicating ranges where processes and devices are actively handling various events (block I/O entered, I/O inserted/merged, I/O issued, I/O complete,...) This data can be plotted (e.g. xmgrace) to see patterns and extract information concerning anomalous behavior 19

btt: I/O Scheduler Example C Activity I/O delayed by mkfs activity Q Activity mkfs & pdflush fight for device 20

btt: I/O Scheduler Explained Characterizing I/O stack Noticed very long I2D times for certain processes Graph shows continuous stream of I/Os......at the device level...for the mkfs.ext3 process Graph shows significant lag for pdflush daemon Last I/O enters block I/O layer around 19 seconds But: last batch of I/Os aren't completed until 14 seconds later! Cause? Anticipatory scheduler allows mkfs.ext3 to proceed, holding off pdflush I/Os 21

Resources Kernel sources: Patch for Linux 2.6.14 rc3 (or later, up to 2.6.17) Linux 2.6.17 (or later) built in Utilities & documentation (& kernel patches) rsync://rsync.kernel.org/pub/scm/linux/kernel/git/axboe/blktrace.git See documentation in doc directory Mailing list: linux btrace@vger.kernel.org 22