<Insert Picture Here> Tracing on Linux: the Old, the New, and the Ugly



Similar documents
<Insert Picture Here> Tracing on Linux

<Insert Picture Here> Tracing on Linux Updates

Perf Tool: Performance Analysis Tool for Linux

System/Networking performance analytics with perf. Hannes Frederic Sowa

Performance Counters on Linux

Efficient and Large-Scale Infrastructure Monitoring with Tracing

Realtime Linux Kernel Features

Large-scale performance monitoring framework for cloud monitoring. Live Trace Reading and Processing

Business Life Path - Red Hat, CFS roadmap

Virtual machine CPU monitoring with Kernel Tracing

OS Observability Tools

Energy Performance and Accounting in Mobile Operating Systems

Red Hat Linux Internals

Perfmon2: A leap forward in Performance Monitoring

Frysk The Systems Monitoring and Debugging Tool. Andrew Cagney

Software Tracing of Embedded Linux Systems using LTTng and Tracealyzer. Dr. Johan Kraft, Percepio AB

Red Hat Summit 2009 Bryan Che

KVM & Memory Management Updates

KVM: Kernel-based Virtualization Driver

Real-time KVM from the ground up

Linux Tools for Monitoring and Performance. Khalid Baheyeldin November 2009 KWLUG

Real-time Debugging using GDB Tracepoints and other Eclipse features

Tracing your virtual machines

VMware Server 2.0 Essentials. Virtualization Deployment and Management

White Paper. Real-time Capabilities for Linux SGI REACT Real-Time for Linux

LinuxCon Europe Cloud Monitoring and Distribution Bug Reporting with Live Streaming and Snapshots.

Oracle Linux Overview. Presented by: Anuj Verma Title: Senior Pre-Sales Consultant

Performance Monitoring of the Software Frameworks for LHC Experiments

2 Purpose. 3 Hardware enablement 4 System tools 5 General features.

Status and Direction of Kernel Development

Managing your Red Hat Enterprise Linux guests with RHN Satellite

2972 Linux Options and Best Practices for Scaleup Virtualization

GPU Profiling with AMD CodeXL

Process Scheduling in Linux

Valgrind BoF Ideas, new features and directions

Developing tests for the KVM autotest framework

Virtual Private Systems for FreeBSD

Using SmartOS as a Hypervisor

Red Hat Developer Toolset 1.1

Lecture 5. User-Mode Linux. Jeff Dike. November 7, Operating Systems Practical. OSP Lecture 5, UML 1/33

Performance Counter. Non-Uniform Memory Access Seminar Karsten Tausche

Power Management in the Linux Kernel

Operating System Structures

Real-Time KVM for the Masses Unrestricted Siemens AG All rights reserved

Module I-7410 Advanced Linux FS-11 Part1: Virtualization with KVM

IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Hyper-V Server Agent Version Fix Pack 2.

DS-5 ARM. Using the Debugger. Version Copyright ARM. All rights reserved. ARM DUI 0446M (ID120712)

A Study of Performance Monitoring Unit, perf and perf_events subsystem

RTOS Debugger for ecos

ovirt: Open Your Virtual Data Center

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

The Red Hat Enterprise Linux advantages over Oracle Linux

An Implementation Of Multiprocessor Linux

Stacey D. Son Consultant/SRI International. BSDCan Developer s Summit 15-May-2013

CS3813 Performance Monitoring Project

Monitoring, Tracing, Debugging (Under Construction)

Getting Started with CodeXL

Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment

Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment

Visualizing gem5 via ARM DS-5 Streamline. Dam Sunwoo ARM R&D December 2012

Virtualization Strategy with Oracle VM and Oracle Linux. Bjorn Naessens

RED HAT DEVELOPER TOOLSET Build, Run, & Analyze Applications On Multiple Versions of Red Hat Enterprise Linux

Android Development: a System Perspective. Javier Orensanz

Introduction. What is an Operating System?

KVM Architecture Overview

Architecture of the Kernel-based Virtual Machine (KVM)

JBOSS OPERATIONS NETWORK (JBOSS ON) MONITORING

Introducing the IBM Software Development Kit for PowerLinux

Private Public Partnership Project (PPP) Large-scale Integrated Project (IP)

A Lightweight Dynamic Performance Monitoring Framework for Embedded Systems

Thomas Fahrig Senior Developer Hypervisor Team. Hypervisor Architecture Terminology Goals Basics Details

ELEC 377. Operating Systems. Week 1 Class 3

Practical Applications of Virtualization. Mike Phillips IAP 2008 SIPB IAP Series

Virtualization Management the ovirt way

KVM in Embedded Requirements, Experiences, Open Challenges

STUDY OF PERFORMANCE COUNTERS AND PROFILING TOOLS TO MONITOR PERFORMANCE OF APPLICATION

Linux Scheduler Analysis and Tuning for Parallel Processing on the Raspberry PI Platform. Ed Spetka Mike Kohler

EXPLORING LINUX KERNEL: THE EASY WAY!

Oracle Linux Strategy and Roadmap

KVM: A Hypervisor for All Seasons. Avi Kivity avi@qumranet.com

Integrating TAU With Eclipse: A Performance Analysis System in an Integrated Development Environment

How do Users and Processes interact with the Operating System? Services for Processes. OS Structure with Services. Services for the OS Itself

Intel Application Software Development Tool Suite 2.2 for Intel Atom processor. In-Depth

DS-5 ARM. Using the Debugger. Version 5.7. Copyright 2010, 2011 ARM. All rights reserved. ARM DUI 0446G (ID092311)

Use-Case Power Management Optimization: Identifying & Tracking Key Power Indicators ELC- E Edimburgh, Patrick Ti:ano

Beyond the Hypervisor

Basic Installation of the Cisco Collection Manager

Performance Application Programming Interface

High Performance Computing in Aachen

PERFORMANCE TOOLS DEVELOPMENTS

TOP(1) Linux User s Manual TOP(1)

Table of Contents. The RCS MINI HOWTO

National CR16C Family On-Chip Emulation. Contents. Technical Notes V

Linux Profiling at Netflix

RISC-V Software Ecosystem. Andrew Waterman UC Berkeley

Embedded Systems. 6. Real-Time Operating Systems

Virtualization Technologies

Overview of the Current Approaches to Enhance the Linux Scheduler. Preeti U. Murthy IBM Linux Technology Center

Automating Linux Malware Analysis Using Limon Sandbox Monnappa K A monnappa22@gmail.com

Long-term monitoring of apparent latency in PREEMPT RT Linux real-time systems

Transcription:

<Insert Picture Here> Tracing on Linux: the Old, the New, and the Ugly Elena Zannoni (elena.zannoni@oracle.com) Linux Engineering, Oracle America October 27 2011

Categories of Tracing Tools Kernel Tracing Userspace Applications Tracing Static Tracing Dynamic Tracing Sampling Continuous Integrated with kernel Standalone 2

A Brief History LTT (Linux Trace Toolkit) (1998) replaced in 2006 Kprobes (2004): http://www.ibm.com/developerworks/library/l-kprobes/index.html Dtrace for Solaris (2003) released on Solaris 10 in November 2005 Systemtap for Linux (2005) at OLS LTTng (LTT Next Generation) (2006): https://lkml.org/lkml/2006/1/9/223 Ftrace: (2008): https://lkml.org/lkml/2008/1/3/26 Perf: (2008): https://lkml.org/lkml/2008/12/4/401 3

Currently Active Projects Ftrace Perf Systemtap LTTng 4

Ftrace In kernel with portions outside Main Maintainer: Steven Rostedt (2008) CLI: /sys/kernel/debug/tracing as interface (control and output) GUI for data visualization (KernelShark) http://people.redhat.com/srostedt/kernelshark/html/ Trace-cmd: user space tool with subcommands http://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git Fundamental mechanism: gprof hooks. Adds mcount() call at entry of each kernel function. Requires kernel to be compiled with -pg option for some functionality User: Root privileges No user space tracing Documentation: in kernel tree Documentation/ftrace.txt and Documentation/ftrace_design.txt See: http://www.lwn.net/articles/365835/ & http://www.lwn.net/articles/366796/ 5

Information collected: ftrace plugins Function: trace entry of all kernel functions Function-graph: traces on both entry and exit of the functions. It then provides the ability to draw a graph of function calls Wakeup: max latency that it takes for the highest priority task to get scheduled after it has been woken up Irqsoff: areas that disable interrupts and saves the trace with the longest max latency Preemptoff: amount of time for which preemption is disabled Preemptirqsoff: largest time for which irqs and/or preemption is disabled Nop: trace nothing Uses /sys/kernel/debug/tracing/ 6

Ftrace: trace-cmd User space tool Requires root Some commands are: Record: start collection of events into file trace.dat Can use ftrace plugins (-p) or static kernel events (-e) Report: display content of trace.dat Start: start tracing but keep data in kernel ring buffer Stop: stop collecting data into the kernel ring buffer Extract: extract data from ring buffer into trace.dat List: list available plugins (-p) or events (-e) Many options, very flexible http://lwn.net/articles/410200/ & http://lwn.net/articles/341902/ 7

Ftrace: Original Interface # echo sched_switch > current_tracer # echo 1 > tracing_on # sleep 1 # echo 0 > tracing_on # cat trace # tracer: sched_switch # TASK PID CPU# TIMESTAMP FUNCTION # bash 3997 [01] 240.132281: 3997:120:R + 4055:120:R bash 3997 [01] 240.132284: 3997:120:R ==> 4055:120:R sleep 4055 [01] 240.132371: 4055:120:S ==> 3997:120:R bash 3997 [01] 240.132454: 3997:120:R + 4055:120:S bash 3997 [01] 240.132457: 3997:120:R ==> 4055:120:R sleep 4055 [01] 240.132460: 4055:120:D ==> 3997:120:R 8

Ftrace: Trace-cmd # trace cmd record e sched_switch sleep 5 # trace cmd report version = 6 cpus=2 sleep 29568 [000] 2930217.458882:sched_switch:trace cmd:29568 [120] R ==> evince:28664 [120] evince 28664 [000] 2930217.458977:sched_switch:evince:28664 [120] S ==> trace cmd:29568 [120] sleep 29568 [000] 2930217.459349:sched_switch:sleep:29568 [120] R ==> pulseaudio:1909 [120] pulseaudio 1909 [000] 2930217.459516:sched_switch:pulseaudio:1909 [120] S ==> sleep:29568 [120] sleep 29568 [000] 2930217.460463:sched_switch: sleep:29568 [120] S ==> trace cmd:29566 [120] trace cmd 29566 [000] 2930217.460510:sched_switch:trace cmd:29566 [120] S ==> trace cmd:29567 [120] trace cmd 29567 [000] 2930217.460543:sched_switch:trace cmd:29567 [120] S ==> swapper:0 [120] <idle> 0 [000] 2930217.460710:sched_switch:swapper:0 [120] R ==> Xorg:1675 [120] 9

10 Ftrace GUI: kernelshark

Ftrace: Dynamic Tracing Added by Masami Hiramatsu (end of 2009) Real dynamic tracing Use /sys/kernel/debug/tracing/kprobe_events to control from command line Flexible to an extent, but hard to use because of syntax Read more: Documentation/trace/kprobetrace.txt LWN article: http://lwn.net/articles/343766/ echo 'r:myretprobe do_sys_open $retval' > /sys/kernel/debug/tracing/kprobe_events echo > /sys/kernel/debug/tracing/kprobe_events 11

Perf In kernel (tools/perf directory) userspace tool (started in 2008) Main contributors : Thomas Gleixner, Ingo Molnar, Frederic Weisebecker, Arnaldo Charvalo De Melo, Peter Zijlstra and others Started as Hardware performance counters interface, initially called perf counters. Originally on top of the hardware-based performance monitoring interface for Linux by Stephane Eranian. Now available as libpfm version 4 http://perfmon2.sourceforge.net/docs_v4.html Has grown into all encompassing tracing system Very active at the moment No user space tracing yet Documentation: tools/perf/design.txt and tools/perf/documenation Perf wiki: https://perf.wiki.kernel.org/index.php/main_page 12

Perf subcommands Perf stat: collects and display events data during a command execution Perf record: run a command, store its profiling (sampling mode) in output file (perf.data) Perf report: display data previously recorded in output file (perf.data) Perf diff: diff between two perf.data files Perf top: performance counter profile in real time Perf probe: define dynamic tracepoints (more on this later) Perf kmem: show statistics on kernel mem (from perf.data), or record kernel mem usage for a command Perf trace-perl (trace-python): process previously recorded trace data using a perl script (python script) 13

Perf subcommands (continued) Perf list: list available symbolic events types (in exadecimal encoding) Perf annotate: display perf.data with assembly or source code of the command executed Perf lock: lock statistics Perf sched: scheduler behaviour Perf kvm: perform same as above but on a kvm guest environment with subcommands: Perf kvm top Perf kvm record Perf kvm report Perf kvm diff 14

Perf: Types of events Perf list shows the list of predefined events that are available Use one or more of the events in perf stat commands: perf stat -e instructions sleep 5 Hardware: cpu cycles, instructions, cache references, cache misses, branch instructions, branch misses, bus cycles. Uses libpfm. CPU model specific. Hardware cache: for instance L1-dcache-loads, L1-icache-prefetches Hardware breakpoint Raw: hexadecimal encoding provided by harware vendor Software: in kernel events. Such as: cpu clock, task clock, page faults, context switches, cpu migrations, page faults, alignment faults, emulation faults. Static tracepoints: needs the ftrace event tracer enabled. Events are listed in /sys/kernel/debug/tracing/events/* (locations defined with TRACE_EVENT() macros) Dynamic tracepoints (if any are defined) See include/linux/perf_event.h 15

Other Control Parameters Process specific: default mode, perf follows fork() and pthread_create() calls Thread specific: use - - no-inherit (or -i) option System wide: use -a option CPU specific: use -C with list of CPU numbers For running process: -p <pid> To find what binaries were running, from a perf.data file, use Elf unique identifier inserted by linker: perf buildid-list User, Kernel, Hypervisor modes: count events when the CPU is in user, kernel and/or hypervisor mode only. Use with perf stat command. 16

Dynamic Tracing in Perf Contributed by Masami Hiramatsu Probes can be inserted on the fly using perf probe Defined on the command line only Syntax can be simple or complex Can use dwarf debuginfo to show source code (--line option) before setting probe (uses elfutils libdw like systemtap), or to use function name, line number, variable name Abstraction on ftrace kprobes, alleviates usage problems, supports also same syntax to specify probes Some options: --add, --del, --line, --list, --dry-run, --verbose, --force Read more: tools/perf/documentation/perf-probe.txt 17

Example Add probe, run a command, and display the data: #./perf probe -P 'p:myprobe @fs/read_write.c:285 file buf count' Adding new event: p:myprobe vfs_read+57 file=%bx buf=%si count=%ax #./perf list... rnnn [raw hardware event descriptor] kprobes:myprobe [Tracepoint event] #./perf record -f -e kprobes:myprobe:record -F 1 -a ls #./perf trace [...] perf-11445 [000] 95862.048894383: myprobe: (c04bbed5) file=dae15e80 buf=b78b2000 count=400 perf-11445 [000] 95862.049066533: myprobe: (c04bbed5) file=dae15d80 buf=b78b2000 count=400 [...] 18

LTTng External to kernel tree Maintainer Mathieu Desnoyers http://lttng.org/ Userspace tracing via markers and tracepoints (static instrumentation). Need to link with special library. kprobes/tracepoints support for kernel tracing GUI to view events (LTTV) Eclipse integration Included in some embedded Linux distributions Just released Version 2.0 (July 2011) 19

Systemtap External to kernel tree (started in 2005) Multiple maintainers: Red Hat, IBM, others http://sourceware.org/systemtap/wiki Kernel tracing using kprobes Userspace tracing requires in kernel infrastructure that is not in mainline (only Red Hat based kernels) (utrace/uprobes) Dynamic probes and tracepoints Has well defined rich scripting language GUI available Safety issues around modifying memory locations (guru mode) Can use debuginfo (seen as drawback) Uses gcc (seen as drawback) Remote operation allowed 20

Kprobes: Dynamic Tracing Added to the Linux kernel in 2005 Similar to breakpoints: place breakpoint instruction at desired code location and execute actions when hit. Optimizations to kprobes using Jumps instead of exceptions. Used by Systemtap first, now also by ftrace and perf 21

Events Markers Static probe points in kernel code Independent of users (ftrace, perf, Lttng, systemap) Many uses in the kernel (a few hundreds and growing) TRACE_EVENT() macro with 6 arguments Definitions in include/linux/tracepoint.h Define characteristics of tracing actions (probe) using TRACE_EVENT() in include/trace/events/*.h Mark tracing locations with function call trace_my_event_name() e.g. trace_sched_switch() in sched.c and TRACE_EVENT(sched_switch,...) defined in sched.h Read the 3-article series: http://lwn.net/articles/379903/ http://lwn.net/articles/381064/ http://lwn.net/articles/383362/ 22

Utrace Systemtap provides userspace tracing based on utrace. Code in RHEL/Fedora. Utrace: goal was to be replacement for ptrace. Status: development abandoned. Fedora still ships it, used by a few other tools. See mailing list: https://www.redhat.com/mailman/listinfo/utrace-devel See code: http://redhat.com/~roland/utrace/ Maintained by Oleg Nesterov Patch kept up to date with recent kernels, most recent is for kernel 3.1 (August 2011) 23

Uprobes Uprobes: handle userspace breakpoints in kernel Analogous to kprobes Uses breakpoint instruction Posted many revisions No longer based on utrace No signals Multiple tracers allowed Ptrace replacement eventually? Still not approved Latest version of the patchset https://lkml.org/lkml/2011/9/20/123 Integration of previous version of the patch with latest kernel F15 and F16 (from Josh Stone) http://permalink.gmane.org/gmane.linux.systemtap/18118 24

Tracing Infrastructure Unification? Goal: unify the kernel infrastructure for tracing There are separate ring buffers used by perf and ftrace Need more complex interfaces Need to be fast Simpler ring buffers in the kernel (kfifo) for other users also exist Unification effort restarted many times: http://lkml.org/lkml/2010/7/9/368 http://lkml.org/lkml/2010/8/17/485 Some old benchmarks: http://git.kernel.org/pub/scm/linux/kernel/git/compudj/linux-2.6-ringbuffer.git Status quo with multiple buffers is likely to persist. Too many differing needs. Possible unification of ftrace and perf? Proposal of trace but stalled: https://lkml.org/lkml/2010/11/16/466 http://www.linuxplumbersconf.org/2011/ocw/proposals/609 25

Dtrace A Solaris tool, available since 2005 Ported to Linux by Oracle, initial release October 2011, version 0.1, still WIP Code posted here http://oss.oracle.com/git/ linux-2.6-dtrace-modules-beta.git linux-2.6-dtrace-unbreakable-beta.git Dtrace provider, syscall provider Next: static probes, then tackle userspace tracing Integrated with Oracle Unbreakable Enterprise Kernel, available on UEK 2.6.32 right now (0.1 version), moving to UEK2 version 2.6.39 from now on x86_64 only Kernel changes are GPL Kernel Module is CDDL Unrelated to other Linux porting work by Paul Fox 26

In Summary Systemtap: Moving forward, but on its own. Provides userspace tracing. Resistance by kernel community around architectural design decisions. Integration as is with upstream kernel unlikely? LTTng: Developed actively, but standalone. Provides userspace tracing. Code in its own repo. Not in mainline kernel. Ftrace: continuing, thanks to trace-cmd and kernelshark. Kernel focused Perf: most active project at this time. Kernel focused Unification of Buffers in kernel: stalled Scalability: in progress. See google's comments: http://lwn.net/articles/464268/ Utrace: development stalled/abandoned, maintenance only Uprobes: hopeful In kernel interpreter: maybe? tracepoints/static events: KABI status? Integration of ftrace and perf? trace? Common Trace Format: proposed by Mathieu Desnoyers, but not accepted by Linux community. Evolving on its own with LTTng-2.0 Overall area: kind of stalled. What happened at KS 2011? Nothing much. 27

Thank You Ananth Mavinakayanahalli (IBM) Srikar Dronamraju (IBM) Masami Hiramatsu (Hitachi) Steven Rostedt (Red Hat) Josh Stone and Frank Eigler (Red Hat) 28

29 The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle.