Linux Profiling at Netflix
|
|
|
- Lambert Farmer
- 9 years ago
- Views:
Transcription
1 Feb 2015 Linux Profiling at Netflix using perf_events (aka "perf") Brendan Gregg Senior Performance Architect Performance Engineering
2 This Talk This talk is about Linux profiling using perf_events How to get CPU profiling to work, and overcome gotchas A tour of perf_events and its features This is based on our use of perf_events at Netflix
3 Massive Amazon EC2 Linux cloud Tens of thousands of instances Autoscale by ~3k each day CentOS and Ubuntu, Java and Node.js FreeBSD for content delivery Approx 33% of the US Internet traffic at night Performance is critical Customer satisfaction: >50M subscribers $$$ price/performance Develop tools for cloud-wide analysis, and make them open source: NetflixOSS Use server tools as needed
4 Agenda 1. Why We Need Linux Profiling 2. Crash Course 3. CPU Profiling 4. Gotchas Stacks (gcc, Java) Symbols (Node.js, Java) Guest PMCs PEBS Overheads 5. Tracing
5 1. Why We Need Linux Profiling
6 Why We Need Linux Profiling Our primary motivation is simple: Understand CPU usage quickly and completely
7 Netflix Vector Quickly: 1. Observe high CPU usage 2. Generate a perf_events-based flame graph
8 Flame Graph Completely: Java JVM (C++) Kernel (C)
9 Value for Netflix Uses for CPU profiling: Help with incident response Non-regression testing Software evaluations Identify performance tuning targets Part of CPU workload characterization Built into Netflix Vector A near real-time instance analysis tool (will be NetflixOSS)
10 Workload Characterization For CPUs: 1. Who 2. Why 3. What 4. How
11 Workload Characterization For CPUs: 1. Who: which PIDs, programs, users 2. Why: code paths, context 3. What: CPU instructions, cycles 4. How: changing over time Can you currently answer them? How?
12 CPU Tools Who Why How What
13 CPU Tools Who Why top, htop perf record -g flame graphs How What monitoring perf stat -a -d
14 Most companies & monitoring products today Who Why top, htop perf record -g flame Graphs How What monitoring perf stat -a -d
15 Re-setting Expectations That was pretty good 20 years ago. Today you should easily understand why CPUs are used: A profile of all CPU consumers and code paths Visualized effectively This should be easy to do Best done using: A perf_events CPU profile of stack traces Visualized as a flame graph This will usually mean some sysadmin/devops work, to get perf_events working, and to automate profiling
16 1. Poor performance, and 1 CPU at 100% 2. perf_events flame graph shows JVM stuck compiling Recent Example
17 2. Crash Course
18 perf_events The main Linux profiler, used via the "perf" command Add from linux-tools-common, etc. Source code & docs in Linux: tools/perf Supports many profiling/tracing features: CPU Performance Monitoring Counters (PMCs) Statically defined tracepoints User and kernel dynamic tracing Kernel line and local variable tracing Efficient in-kernel counts and filters Stack tracing, libunwind Code annotation Some bugs in the past; has been stable for us perf_events ponycorn
19 A Multitool of Subcommands # perf usage: perf [--version] [--help] COMMAND [ARGS] The most commonly used perf commands are: annotate archive Read perf.data (created by perf record) and display annotated code Create archive with object files with build-ids found in perf.data bench General framework for benchmark suites buildid-cache Manage build-id cache. buildid-list diff List the buildids in a perf.data file Read two perf.data files and display the differential profile evlist List the event names in a perf.data file inject Filter to augment the events stream with additional information kmem Tool to trace/measure kernel memory(slab) properties kvm list Tool to trace/measure kvm guest os List all symbolic event types lock Analyze lock events probe Define new dynamic tracepoints record Run a command and record its profile into perf.data report sched Read perf.data (created by perf record) and display the profile Tool to trace/measure scheduler properties (latencies) script Read perf.data (created by perf record) and display trace output stat test Run a command and gather performance counter statistics Runs sanity tests. timechart Tool to visualize total system behavior during a workload top System profiling tool. See 'perf help COMMAND' for more information on a specific command.
20 perf Command Format perf instruments using stat or record. This has three main parts: action, event, scope. e.g., profiling on-cpu stack traces: Ac+on: record stack traces perf record -F 99 -a -g -- sleep 10 Event: 99 Hertz Scope: all CPUs Note: sleep 10 is a dummy command to set the duraoon
21 perf Actions Count events (perf stat ) Uses an efficient in-kernel counter, and prints the results Sample events (perf record ) Records details of every event to a dump file (perf.data) Timestamp, CPU, PID, instruction pointer, This incurs higher overhead, relative to the rate of events Include the call graph (stack trace) using -g Other actions include: List events (perf list) Report from a perf.data file (perf report) Dump a perf.data file as text (perf script) top style profiling (perf top)
22 perf Actions: Workflow list events count events capture stacks perf list perf stat perf record Typical Workflow perf.data text UI dump profile perf report perf script flame graph visualizaoon stackcollapse-perf.pl flamegraph.pl
23 perf Events Custom timers e.g., 99 Hertz (samples per second) Hardware events CPU Performance Monitoring Counters (PMCs) Tracepoints Statically defined in software Dynamic tracing Created using uprobes (user) or kprobes (kernel) Can do kernel line tracing with local variables (needs kernel debuginfo)
24 perf Events: Map
25 perf Events: List # perf list List of pre-defined events (to be used in -e): cpu-cycles OR cycles instructions cache-references cache-misses branch-instructions OR branches branch-misses bus-cycles stalled-cycles-frontend OR idle-cycles-frontend stalled-cycles-backend OR idle-cycles-backend [ ] cpu-clock task-clock page-faults OR faults context-switches OR cs cpu-migrations OR migrations [ ] L1-dcache-loads L1-dcache-load-misses L1-dcache-stores [ ] skb:kfree_skb skb:consume_skb skb:skb_copy_datagram_iovec net:net_dev_xmit net:net_dev_queue net:netif_receive_skb net:netif_rx [ ] [Hardware event] [Hardware event] [Hardware event] [Hardware event] [Hardware event] [Hardware event] [Hardware event] [Hardware event] [Hardware event] [Software event] [Software event] [Software event] [Software event] [Software event] [Hardware cache event] [Hardware cache event] [Hardware cache event] [Tracepoint event] [Tracepoint event] [Tracepoint event] [Tracepoint event] [Tracepoint event] [Tracepoint event] [Tracepoint event]
26 perf Scope System-wide: all CPUs (-a) Target PID (-p PID) Target command ( ) Specific CPUs (-c ) User-level only (<event>:u) Kernel-level only (<event>:k) A custom filter to match variables (--filter ) The following one-liner tour includes some complex action, event, and scope combinations.
27 One-Liners: Listing Events # Listing all currently known events: perf list # Searching for "sched" tracepoints: perf list grep sched # Listing sched tracepoints: perf list 'sched:*'
28 One-Liners: Counting Events # CPU counter statistics for the specified command: perf stat command # Detailed CPU counter statistics (includes extras) for the specified command: perf stat -d command # CPU counter statistics for the specified PID, until Ctrl-C: perf stat -p PID # CPU counter statistics for the entire system, for 5 seconds: perf stat -a sleep 5 # Various CPU last level cache statistics for the specified command: perf stat -e LLC-loads,LLC-load-misses,LLC-stores,LLC-prefetches command # Count system calls for the specified PID, until Ctrl-C: perf stat -e 'syscalls:sys_enter_*' -p PID # Count scheduler events for the specified PID, for 10 seconds: perf stat -e 'sched:*' -p PID sleep 10 # Count block device I/O events for the entire system, for 10 seconds: perf stat -e 'block:*' -a sleep 10 # Show system calls by process, refreshing every 2 seconds: perf top -e raw_syscalls:sys_enter -ns comm
29 One-Liners: Profiling Events # Sample on-cpu functions for the specified command, at 99 Hertz: perf record -F 99 command # Sample on-cpu functions for the specified PID, at 99 Hertz, until Ctrl-C: perf record -F 99 -p PID # Sample CPU stack traces for the specified PID, at 99 Hertz, for 10 seconds: perf record -F 99 -p PID -g -- sleep 10 # Sample CPU stack traces for the entire system, at 99 Hertz, for 10 seconds: perf record -F 99 -ag -- sleep 10 # Sample CPU stack traces, once every 10,000 Level 1 data cache misses, for 5 s: perf record -e L1-dcache-load-misses -c ag -- sleep 5 # Sample CPU stack traces, once every 100 last level cache misses, for 5 seconds: perf record -e LLC-load-misses -c 100 -ag -- sleep 5 # Sample on-cpu kernel instructions, for 5 seconds: perf record -e cycles:k -a -- sleep 5 # Sample on-cpu user instructions, for 5 seconds: perf record -e cycles:u -a -- sleep 5
30 One-Liners: Reporting # Show perf.data in an ncurses browser (TUI) if possible: perf report # Show perf.data with a column for sample count: perf report -n # Show perf.data as a text report, with data coalesced and percentages: perf report --stdio # List all raw events from perf.data: perf script # List all raw events from perf.data, with customized fields: perf script -f comm,tid,pid,time,cpu,event,ip,sym,dso # Dump raw contents from perf.data as hex (for debugging): perf script -D # Disassemble and annotate instructions with percentages (needs some debuginfo): perf annotate --stdio
31 And More perf can also probe and record dynamic tracepoints, and custom CPU PMCs This can get a little advanced I'll start with CPU profiling, then gotchas
32 3. CPU Profiling
33 CPU Profiling Record stacks at a timed interval: simple and effective Pros: Low (deterministic) overhead Cons: Coarse accuracy, but usually sufficient B B A A A stack samples: A A A B syscall on- CPU off- CPU Ome block interrupt
34 perf Screenshot Sampling full stack traces at 99 Hertz: # perf record -F 99 -ag -- sleep 30 [ perf record: Woken up 9 times to write data ] [ perf record: Captured and wrote MB perf.data (~ samples) ] # perf report -n --stdio 1.40% 162 java [kernel.kallsyms] [k] _raw_spin_lock --- _raw_spin_lock %-- try_to_wake_up %-- default_wake_function %-- wake_up_common wake_up_locked ep_poll_callback wake_up_common wake_up_sync_key %-- sock_def_readable [ 78,000 lines truncated ]
35 perf Reporting perf report summarizes by combining common paths Previous output truncated 78,000 lines of summary The following is what a mere 8,000 lines looks like
36 perf report
37 as a Flame Graph
38 Flame Graphs git clone --depth 1 cd FlameGraph perf record -F 99 -a g -- sleep 30 perf script./stackcollapse-perf.pl./flamegraph.pl > perf.svg Flame Graphs: x-axis: alphabetical stack sort, to maximize merging y-axis: stack depth color: random, or hue can be a dimension e.g., software type, or difference between two profiles for non-regression testing ("differential flame graphs") interpretation: top edge is on-cpu, beneath it is ancestry Just a Perl program to convert perf stacks into SVG Includes JavaScript: open in a browser for interactivity Easy to get working
39 4. Gotchas
40 When you try to use perf Stacks don't work (missing) Symbols don't work (hex numbers) Can't profile Java Can't profile Node.js/io.js PMCs aren't available Dynamic tracing function arguments don't work perf locks up
41 How to really get started 1. Get "perf" to work 2. Get stack walking to work 3. Fix symbol translation 4. Get IPC to work 5. Test perf under load This is my actual checklist.
42 How to really get started 1. Get "perf" to work 2. Get stack walking to work 3. Fix symbol translation 4. Get IPC to work 5. Test perf under load Install perf- tools- common and perf- tools- `uname - r` packages; Or compile in the Linux source: tools/perf The "gotchas" This is my actual checklist.
43 Gotcha #1 Broken Stacks perf record -F 99 -a g -- sleep 30 perf report -n --stdio Start by testing stacks: 1. Take a CPU profile 2. Run perf report 3. If stacks are often < 3 frames, or don't reach "thread start" or "main", they are probably broken. Fix them.
44 Identifying Broken Stacks 28.10% 146 sed libc-2.19.so [.] re_search_internal --- re_search_internal %-- 0x3 broken 0x % 62 sed sed [.] re_search_internal --- re_search_internal %-- re_search_stub rpl_re_search match_regex do_subst execute_program process_files main libc_start_main not broken %-- rpl_re_search match_regex do_subst execute_program process_files main libc_start_main
45 Identifying Broken Stacks 78.50% 409 sed libc-2.19.so [.] 0x dd7d %-- 0x7f2516d5d10d %-- 0x7f2516d0332f broken %-- 0x7f2516cffbd %-- 0x7f2516d5d5ad 71.79% 334 sed sed [.] 0x afc %-- 0x40a447 0x40659a 0x408dd8 0x408ed1 0x x7fa1cd08aec %-- 0x40a4a %-- 0x40659a 0x408dd8 0x408ed1 0x x7fa1cd08aec5 probably not broken missing symbols, but that's another problem
46 Broken Stacks Flame Graph Broken Java stacks (missing frame pointer) Java == green system == red C++ == yellow
47 Fixing Broken Stacks Either: A. Fix frame pointer-based stack walking (the default) Pros: simple, supports any system stack walker Cons: might cost a little extra CPU to make available B. Use libunwind and DWARF: perf record -g dwarf Pros: more debug info Cons: not on older kernels, and inflates instance size C. Use a custom walker (probably needs kernel support) Pros: full stack walking (e.g., unwind inlined frames) & args Cons: custom kernel code, can cost more CPU when in use Our current preference is (A) So how do we fix the frame pointer
48 gcc -fno-omit-frame-pointer Once upon a time, x86 had fewer registers, and the frame pointer register was reused for general purpose to improve performance. This breaks system stack walking. gcc provides -fno-omit-frame-pointer to fix this Please make this the default in gcc
49 JDK Java's compilers also reuse the frame pointer, and unfortunately there's no -XX:no-omit-frame-pointer (yet) I hacked hotspot to fix the frame pointer, and published the patch as a prototype suggestion (JDK ) --- openjdk8clean/hotspot/src/cpu/x86/vm/x86_64.ad openjdk8/hotspot/src/cpu/x86/vm/x86_64.ad -166,10 // 3) reg_class stack_slots( /* one chunk of stack-based "registers" */ ) // -// Class for all pointer registers (including RSP) +// Class for all pointer registers (including RSP, excluding RBP) reg_class any_reg(rax, RAX_H, RDX, RDX_H, - RBP, RBP_H, RDI, RDI_H, RSI, RSI_H, RCX, RCX_H, [...] Remove RBP from register pools
50 JDK openjdk8clean/hotspot/src/cpu/x86/vm/macroassembler_x86.cpp openjdk8/hotspot/src/cpu/x86/vm/macroassembler_x86.cpp , ,7 // We always push rbp, so that on return to interpreter rbp, will be // restored correctly and we can correct the stack. push(rbp); + mov(rbp, rsp); // Remove word for ebp framesize -= wordsize; Fix x86-64 funcoon prologues --- openjdk8clean/hotspot/src/cpu/x86/vm/c1_macroassembler_x86.cpp +++ openjdk8/hotspot/src/cpu/x86/vm/c1_macroassembler_x86.cpp [...] We've been using our patched OpenJDK for profiling To do: make this an option (-XX:MoreFramePointer), and (at some point) fix for invokedynamic See "A hotspot patch for stack profiling (frame pointer)" on the hotspot compiler dev mailing list, being discussed now
51 Broken Java Stacks # perf script [ ] java 4579 cpu-clock: ffffffff8172adff tracesys ([kernel.kallsyms]) 7f4183bad7ce pthread_cond_timedwait@@glibc_2 java 4579 cpu-clock: 7f417908c10b [unknown] (/tmp/perf-4458.map) java 4579 cpu-clock: 7f c97 [unknown] (/tmp/perf-4458.map) java 4579 cpu-clock: 7f41792fc65f [unknown] (/tmp/perf-4458.map) a2d53351ff7da603 [unknown] ([unknown]) java 4579 cpu-clock: 7f aec [unknown] (/tmp/perf-4458.map) java 4579 cpu-clock: 7f d0f [unknown] (/tmp/perf-4458.map) java 4579 cpu-clock: 7f417908c194 [unknown] (/tmp/perf-4458.map) [ ] Check with "perf script" to see stack samples These are 1 or 2 levels deep (junk values)
52 Fixed Java Stacks # perf script [ ] java 8131 cpu-clock: 7fff76f2dce1 [unknown] ([vdso]) 7fd3173f7a93 os::javatimemillis() (/usr/lib/jvm 7fd301861e46 [unknown] (/tmp/perf-8131.map) 7fd30184def8 [unknown] (/tmp/perf-8131.map) 7fd30174f544 [unknown] (/tmp/perf-8131.map) 7fd30175d3a8 [unknown] (/tmp/perf-8131.map) 7fd30166d51c [unknown] (/tmp/perf-8131.map) 7fd301750f34 [unknown] (/tmp/perf-8131.map) 7fd3016c2280 [unknown] (/tmp/perf-8131.map) 7fd301b02ec0 [unknown] (/tmp/perf-8131.map) 7fd3016f9888 [unknown] (/tmp/perf-8131.map) 7fd3016ece04 [unknown] (/tmp/perf-8131.map) 7fd c [unknown] (/tmp/perf-8131.map) 7fd301600aa8 [unknown] (/tmp/perf-8131.map) 7fd301a4484c [unknown] (/tmp/perf-8131.map) 7fd e0 [unknown] (/tmp/perf-8131.map) 7fd [unknown] (/tmp/perf-8131.map) 7fd [unknown] (/tmp/perf-8131.map) 7fd e7 [unknown] (/tmp/perf-8131.map) 7fd3171df76a JavaCalls::call_helper(JavaValue*, 7fd3171dce44 JavaCalls::call_virtual(JavaValue* 7fd3171dd43a JavaCalls::call_virtual(JavaValue* 7fd31721b6ce thread_entry(javathread*, Thread*) 7fd e0 JavaThread::thread_main_inner() (/ 7fd317538cb2 JavaThread::run() (/usr/lib/jvm/nf 7fd3173f6f52 java_start(thread*) (/usr/lib/jvm/ 7fd317a7e182 start_thread (/lib/x86_64-linux-gn With JDK stacks are full, and go all the way to start_thread() This is what the CPUs are really running: inlined frames are not present
53 Fixed Stacks Flame Graph Java (no symbols)
54 Gotcha #2 Missing Symbols Missing symbols should be obvious in perf report/script: 71.79% 334 sed sed [.] 0x afc %-- 0x40a447 0x40659a 0x408dd8 0x408ed1 broken 0x x7fa1cd08aec % 62 sed sed [.] re_search_internal --- re_search_internal %-- re_search_stub rpl_re_search match_regex do_subst not broken execute_program process_files main libc_start_main
55 Fixing Symbols For installed packages: A. Add a -dbgsym package, if available B. Recompile from source For JIT (Java, Node.js, ): A. Create a /tmp/perf-pid.map file. perf already looks for this B. Or use one of the new symbol loggers (see lkml) # perf script Failed to open /tmp/perf-8131.map, continuing without symbols [ ] java 8131 cpu-clock: 7fff76f2dce1 [unknown] ([vdso]) 7fd3173f7a93 os::javatimemillis() (/usr/lib/jvm 7fd301861e46 [unknown] (/tmp/perf-8131.map) [ ]
56 perf JIT symbols: Java, Node.js Using the /tmp map file for symbol translation: Pros: simple, can be low overhead (snapshot on demand) Cons: stale symbols Map format is "START SIZE symbolname" Another gotcha: If perf is run as root, chown root <mapfile> Java Agent attaches and writes the map file on demand (previous versions attached on Java start, and wrote continually) Node.js node --perf_basic_prof writes the map file continually Available from Currently has a file growth issue; see my patch in
57 Java: Stacks & Symbols flamegraph.pl --color=java Kernel Java JVM
58 Java: Inlining Disabling inlining: -XX:-Inline Many more Java frames 80% slower (in this case) Not really necessary Inlined flame graphs often make enough sense Or tune -XX:MaxInlineSize and -XX:InlineSmallCode a little to reveal more frames, without costing much perf Can even go faster No inlining
59 Node.js: Stacks & Symbols Covered previously on the Netflix Tech Blog
60 Gotcha #3 Guest PMCs Using PMCs from a Xen guest (currently): # perf stat -a -d sleep 5 Performance counter stats for 'system wide': task-clock (msec) # CPUs utilized [100.00%] 323 context-switches # K/sec [100.00%] 17 cpu-migrations # K/sec [100.00%] 233 page-faults # K/sec <not supported> cycles <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend <not supported> instructions <not supported> branches <not supported> branch-misses <not supported> L1-dcache-loads <not supported> L1-dcache-load-misses <not supported> LLC-loads <not supported> LLC-load-misses seconds time elapsed
61 Guest PMCs Without PMCs, %CPU is ambiguous. We can't measure: Instructions Per Cycle (IPC) CPU cache hits/misses MMU TLB hits/misses Branch misprediction Stall cycles Should be fixable: hypervisors can expose PMCs At the very least, enough PMCs for IPC to work: INST_RETIRED.ANY_P & CPU_CLK_UNHALTED.THREAD_P In the meantime I'm using a physical server for PMC analysis Also some MSRs on the cloud
62 MSRs Model Specific Registers (MSRs) may be exposed when PMCs are not Better than nothing. Can solve some issues. #./showboost CPU MHz : 2500 Turbo MHz : 2900 (10 active) Turbo Ratio : 116% (10 active) CPU 0 summary every 5 seconds... TIME C0_MCYC C0_ACYC UTIL RATIO MHz 17:28: % 116% :28: % 116% :28: % 116% :28: % 115% :28: % 115% 2899 [...] showboost is from my msr-cloud-tools collection (on github)
63 Gotcha #4 Instruction Profiling # perf annotate -i perf.data.noplooper --stdio Percent Source code & Disassembly of noplooper : Disassembly of section.text: : : ed <main>: 0.00 : 4004ed: push %rbp 0.00 : 4004ee: mov %rsp,%rbp : 4004f1: nop 0.00 : 4004f2: nop 16 NOPs in a loop 0.00 : 4004f3: nop 0.00 : 4004f4: nop : 4004f5: nop 0.00 : 4004f6: nop 0.00 : 4004f7: nop 0.00 : 4004f8: nop : 4004f9: nop 0.00 : 4004fa: nop 0.00 : 4004fb: nop 0.00 : 4004fc: nop : 4004fd: nop 0.00 : 4004fe: nop 0.00 : 4004ff: nop 0.00 : : nop : : jmp 4004f1 <main+0x4> Let's profile instrucoons to see which are hot (have I lost my mind?)
64 Instruction Profiling Even distribution (A)? Or something else? (A) (B) (C) (D)
65 Instruction Profiling # perf annotate -i perf.data.noplooper --stdio Percent Source code & Disassembly of noplooper : Disassembly of section.text: : : ed <main>: 0.00 : 4004ed: push %rbp 0.00 : 4004ee: mov %rsp,%rbp : 4004f1: nop 0.00 : 4004f2: nop 0.00 : 4004f3: nop 0.00 : 4004f4: nop : 4004f5: nop 0.00 : 4004f6: nop 0.00 : 4004f7: nop 0.00 : 4004f8: nop : 4004f9: nop 0.00 : 4004fa: nop 0.00 : 4004fb: nop 0.00 : 0.00 : 4004fc: 4004fe: nop nop : 0.00 : 4004fd: 4004ff: nop nop Go home instrucoon pointer, you're drunk 0.00 : : nop : : jmp 4004f1 <main+0x4>
66 PEBS I believe this is sample "skid", plus parallel and out-oforder execution of micro-ops: the sampled IP is the resumption instruction, not what is currently executing. PEBS may help: Intel's Precise Event Based Sampling perf_events has support: perf record -e cycles:pp The 'p' can be specified multiple times: 0 - SAMPLE_IP can have arbitrary skid 1 - SAMPLE_IP must have constant skid 2 - SAMPLE_IP requested to have 0 skid 3 - SAMPLE_IP must have 0 skid from tools/perf/documentation/perf-list.txt
67 Gotcha #5 Overhead Overhead is relative to the rate of events instrumented perf stat does in-kernel counts, with relatively low CPU overhead perf record writes perf.data, which has slightly higher CPU overhead, plus file system and disk I/O Test before use In the lab Run perf stat to understand rate, before perf record Also consider --filter, to filter events in-kernel
68 5. Tracing
69 Profiling vs Tracing Profiling takes samples. Tracing records every event. There are many tracers for Linux (SystemTap, ktap, etc), but only two in mainline: perf_events and ftrace Linux Tracing Stack one- liners: front- end tools: tracing frameworks: tracing instrumentaoon: many perf, perf- tools perf_events, krace, ebpf, tracepoints, kprobes, uprobes
70 Tracing Example # perf record -e block:block_rq_insert -a ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote MB perf.data (~7527 samples) ] # perf script [ ] java 9940 [015] : block_rq_insert: 202,1 R 0 () [java] java 9940 [015] : block_rq_insert: 202,1 R 0 () [java] java 9940 [015] : block_rq_insert: 202,1 R 0 () [java] java 9940 [000] : block_rq_insert: 202,1 R 0 () [java] [ ]
71 Tracing Example # perf record -e block:block_rq_insert -a ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote MB perf.data (~7527 samples) ] # perf script [ ] java 9940 [015] : block_rq_insert: 202,1 R 0 () [java] java 9940 [015] : block_rq_insert: 202,1 R 0 () [java] java 9940 [015] : block_rq_insert: 202,1 R 0 () [java] java 9940 [000] : block_rq_insert: 202,1 R 0 () [java] [ ] process PID [CPU] Omestamp: eventname: format string include/trace/events/block.h: java 9940 [015] : block_rq_insert: 202,1 R 0 () [java] DECLARE_EVENT_CLASS(block_rq, [...] TP_printk("%d,%d %s %u (%s) %llu + %u [%s]", MAJOR( entry->dev), MINOR( entry->dev), entry->rwbs, entry->bytes, get_str(cmd), (unsigned long long) entry->sector, entry->nr_sector, entry->comm) kernel source may be the only docs
72 perf Block I/O Latency Heat Map Trace data may only make sense when visualized e.g., block I/O latency from perf_events trace data: SSD I/O (fast, with queueing) HDD I/O (random, modes) hop:// 01/perf- heat- maps.html
73 One-Liners: Static Tracing # Trace new processes, until Ctrl-C: perf record -e sched:sched_process_exec -a # Trace all context-switches with stack traces, for 1 second: perf record -e context-switches ag -- sleep 1 # Trace CPU migrations, for 10 seconds: perf record -e migrations -a -- sleep 10 # Trace all connect()s with stack traces (outbound connections), until Ctrl-C: perf record -e syscalls:sys_enter_connect ag # Trace all block device (disk I/O) requests with stack traces, until Ctrl-C: perf record -e block:block_rq_insert -ag # Trace all block device issues and completions (has timestamps), until Ctrl-C: perf record -e block:block_rq_issue -e block:block_rq_complete -a # Trace all block completions, of size at least 100 Kbytes, until Ctrl-C: perf record -e block:block_rq_complete --filter 'nr_sector > 200' # Trace all block completions, synchronous writes only, until Ctrl-C: perf record -e block:block_rq_complete --filter 'rwbs == "WS"' # Trace all block completions, all types of writes, until Ctrl-C: perf record -e block:block_rq_complete --filter 'rwbs ~ "*W*"' # Trace all ext4 calls, and write to a non-ext4 location, until Ctrl-C: perf record -e 'ext4:*' -o /tmp/perf.data -a
74 Tracepoint Variables Some previous one-liners used variables with --filter The ftrace interface has a way to print them: # cat /sys/kernel/debug/tracing/events/block/block_rq_insert/format name: block_rq_insert ID: 884 format: field:unsigned short common_type; offset:0;size:2; signed:0; field:unsigned char common_flags; offset:2;size:1; signed:0; field:unsigned char common_preempt_count; offset:3;size:1; signed:0; field:int common_pid; offset:4;size:4; signed:1; field:dev_t dev; offset:8;size:4; signed:0; field:sector_t sector; offset:16;size:8; signed:0; field:unsigned int nr_sector; offset:24;size:4; signed:0; field:unsigned int bytes;offset:28;size:4; signed:0; field:char rwbs[8];offset:32;size:8; signed:1; field:char comm[16];offset:40;size:16; signed:1; field: data_loc char[] cmd; offset:56;size:4; signed:1; variables (format string internals) print fmt: "%d,%d %s %u (%s) %llu + %u [%s]", ((unsigned int) ((REC->dev) >> 20)), ((unsigned int) ((REC->dev) & ((1U << 20) - 1))), REC->rwbs, REC->bytes, get_str(cmd), (unsigned long long)rec->sector, REC->nr_sector, REC->comm
75 One-Liners: Dynamic Tracing # Add a tracepoint for the kernel tcp_sendmsg() function entry (--add optional): perf probe --add tcp_sendmsg # Remove the tcp_sendmsg() tracepoint (or use --del): perf probe -d tcp_sendmsg # Add a tracepoint for the kernel tcp_sendmsg() function return: perf probe 'tcp_sendmsg%return' # Show avail vars for the tcp_sendmsg(), plus external vars (needs debuginfo): perf probe -V tcp_sendmsg --externs # Show available line probes for tcp_sendmsg() (needs debuginfo): perf probe -L tcp_sendmsg # Add a tracepoint for tcp_sendmsg() line 81 with local var seglen (debuginfo): perf probe 'tcp_sendmsg:81 seglen' # Add a tracepoint for do_sys_open() with the filename as a string (debuginfo): perf probe 'do_sys_open filename:string' # Add a tracepoint for myfunc() return, and include the retval as a string: perf probe 'myfunc%return +0($retval):string' # Add a tracepoint for the user-level malloc() function from libc: perf probe -x /lib64/libc.so.6 malloc # List currently available dynamic probes: perf probe -l
76 One-Liners: Advanced Dynamic Tracing # Add a tracepoint for tcp_sendmsg(), with three entry regs (platform specific): perf probe 'tcp_sendmsg %ax %dx %cx' # Add a tracepoint for tcp_sendmsg(), with an alias ("bytes") for %cx register: perf probe 'tcp_sendmsg bytes=%cx' # Trace previously created probe when the bytes (alias) var is greater than 100: perf record -e probe:tcp_sendmsg --filter 'bytes > 100' # Add a tracepoint for tcp_sendmsg() return, and capture the return value: perf probe 'tcp_sendmsg%return $retval' # Add a tracepoint for tcp_sendmsg(), and "size" entry argument (debuginfo): perf probe 'tcp_sendmsg size' # Add a tracepoint for tcp_sendmsg(), with size and socket state (debuginfo): perf probe 'tcp_sendmsg size sk-> sk_common.skc_state' # Trace previous probe when size > 0, and state = TCP_ESTABLISHED(1) (debuginfo): perf record -e probe:tcp_sendmsg --filter 'size > 0 && skc_state = 1' -a Kernel debuginfo is an onerous requirement for the Netflix cloud We can use registers instead (as above). But which registers?
77 The Rosetta Stone of Registers One server instance with kernel debuginfo, and -nv (dry run, verbose): # perf probe -nv 'tcp_sendmsg size sk-> sk_common.skc_state' [ ] Added new event: Writing event: p:probe/tcp_sendmsg tcp_sendmsg+0 size=%cx:u64 skc_state=+18(%si):u8 probe:tcp_sendmsg (on tcp_sendmsg with size skc_state=sk-> sk_common.skc_state) You can now use it in all perf tools, such as: perf record -e probe:tcp_sendmsg -ar sleep 1 All other instances (of the same kernel version): Copy- n- paste # perf probe 'tcp_sendmsg+0 size=%cx:u64 skc_state=+18(%si):u8' Failed to find path of kernel module. Added new event: probe:tcp_sendmsg (on tcp_sendmsg with size=%cx:u64 skc_state=+18(%si):u8) You can now use it in all perf tools, such as: perf record -e probe:tcp_sendmsg -ar sleep 1 Masami Hiramatsu was investigating a way to better automate this
78 perf_events Scripting perf also has a scripting interface (Perl or Python) These run perf and post-process trace data in user-space Data path has some optimizations Kernel buffering, and dynamic (optimal) number of buffer reads But may still not be enough for high volume events Andi Kleen has scripted perf for his PMC tools Includes toplev for applying "Top Down" methodology I've developed my own tools for perf & ftrace Each tool has a man page and examples file These are unsupported temporary hacks; their internals should be rewritten when kernel features arrive (e.g., ebpf)
79 perf-tools: bitesize Block (disk) I/O size distributions: #./bitesize Tracing block I/O size (bytes), until Ctrl-C... ^C Kbytes : I/O Distribution -> 0.9 : > 7.9 : 38 # 8.0 -> 63.9 : ###################################### > : 13 # > : 1 # [ ] This automates perf with a set of in-kernel filters and counts for each bucket, to reduce overhead Will be much easier with ebpf
80 ebpf Extended BPF: programs on tracepoints High-performance filtering: JIT In-kernel summaries: maps e.g., in-kernel latency heat map (showing bimodal): Time Low latency cache hits High latency device I/O
81 Linux Profiling Future ebpf is integrating, and provides the final missing piece of tracing infrastructure: efficient kernel programming perf_events + ebpf? ftrace + ebpf? Other tracers + ebpf? At Netflix, the future is Vector, and more self-service automation of perf_events
82 Summary & Your Action Items Short term: get full CPU profiling to work A. Automate perf CPU profiles with flame graphs. See this talk B. or use Netflix Vector when it is open sourced C. or ask performance monitoring vendors for this Most importantly, you should expect that full CPU profiles are available at your company. The ROI is worth it. Long term: PMCs & tracing Use perf_events to profile other targets: CPU cycles, file system I/O, disk I/O, memory usage, Go forth and profile The "real" checklist reminder: 1. Get "perf" to work 2. Get stack walking to work 3. Fix symbol translaoon 4. Get IPC to work 5. Test perf under load
83 Links & References perf_events Kernel source: tools/perf/documentation Mailing list perf-tools: PMU tools: perf, ftrace, and more: Java frame pointer patch Node.js: Methodology: Flame graphs: Heat maps: ebpf:
84 Thanks Questions? Performance and Reliability
Perf Tool: Performance Analysis Tool for Linux
/ Notes on Linux perf tool Intended audience: Those who would like to learn more about Linux perf performance analysis and profiling tool. Used: CPE 631 Advanced Computer Systems and Architectures CPE
<Insert Picture Here> Tracing on Linux
Tracing on Linux Elena Zannoni ([email protected]) Linux Engineering, Oracle America November 6 2012 The Tree of Tracing SystemTap LTTng perf DTrace ftrace GDB TRACE_EVENT
<Insert Picture Here> Tracing on Linux: the Old, the New, and the Ugly
Tracing on Linux: the Old, the New, and the Ugly Elena Zannoni ([email protected]) Linux Engineering, Oracle America October 27 2011 Categories of Tracing Tools Kernel Tracing
System/Networking performance analytics with perf. Hannes Frederic Sowa <[email protected]>
System/Networking performance analytics with perf Hannes Frederic Sowa Prerequisites Recent Linux Kernel CONFIG_PERF_* CONFIG_DEBUG_INFO Fedora: debuginfo-install kernel for
Performance Counters on Linux
Performance Counters on Linux The New Tools Linux Plumbers Conference, September, 2009 Arnaldo Carvalho de Melo [email protected] How did I get involved?. I am no specialist on performance counters. pahole
NetBeans Profiler is an
NetBeans Profiler Exploring the NetBeans Profiler From Installation to a Practical Profiling Example* Gregg Sporar* NetBeans Profiler is an optional feature of the NetBeans IDE. It is a powerful tool that
Performance Improvement In Java Application
Performance Improvement In Java Application Megha Fulfagar Accenture Delivery Center for Technology in India Accenture, its logo, and High Performance Delivered are trademarks of Accenture. Agenda Performance
RTOS Debugger for ecos
RTOS Debugger for ecos TRACE32 Online Help TRACE32 Directory TRACE32 Index TRACE32 Documents... RTOS Debugger... RTOS Debugger for ecos... 1 Overview... 2 Brief Overview of Documents for New Users... 3
Basics of VTune Performance Analyzer. Intel Software College. Objectives. VTune Performance Analyzer. Agenda
Objectives At the completion of this module, you will be able to: Understand the intended purpose and usage models supported by the VTune Performance Analyzer. Identify hotspots by drilling down through
CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study
CS 377: Operating Systems Lecture 25 - Linux Case Study Guest Lecturer: Tim Wood Outline Linux History Design Principles System Overview Process Scheduling Memory Management File Systems A review of what
Frysk The Systems Monitoring and Debugging Tool. Andrew Cagney
Frysk The Systems Monitoring and Debugging Tool Andrew Cagney Agenda Two Use Cases Motivation Comparison with Existing Free Technologies The Frysk Architecture and GUI Command Line Utilities Current Status
Broken Performance Tools
Nov 2015 Broken Performance Tools Brendan Gregg Senior Performance Architect, Netflix CAUTION: PERFORMANCE TOOLS Over 60 million subscribers AWS EC2 Linux cloud FreeBSD CDN Awesome place to work This Talk
Efficient and Large-Scale Infrastructure Monitoring with Tracing
CloudOpen Europe 2013 Efficient and Large-Scale Infrastructure Monitoring with Tracing [email protected] 1 Content Overview of tracing and LTTng LTTng features for Cloud Providers LTTng as a
Mark Bennett. Search and the Virtual Machine
Mark Bennett Search and the Virtual Machine Agenda Intro / Business Drivers What to do with Search + Virtual What Makes Search Fast (or Slow!) Virtual Platforms Test Results Trends / Wrap Up / Q & A Business
Performance Counter. Non-Uniform Memory Access Seminar Karsten Tausche 2014-12-10
Performance Counter Non-Uniform Memory Access Seminar Karsten Tausche 2014-12-10 Performance Counter Hardware Unit for event measurements Performance Monitoring Unit (PMU) Originally for CPU-Debugging
PERFORMANCE TOOLS DEVELOPMENTS
PERFORMANCE TOOLS DEVELOPMENTS Roberto A. Vitillo presented by Paolo Calafiura & Wim Lavrijsen Lawrence Berkeley National Laboratory Future computing in particle physics, 16 June 2011 1 LINUX PERFORMANCE
Stop the Guessing. Performance Methodologies for Production Systems. Brendan Gregg. Lead Performance Engineer, Joyent. Wednesday, June 19, 13
Stop the Guessing Performance Methodologies for Production Systems Brendan Gregg Lead Performance Engineer, Joyent Audience This is for developers, support, DBAs, sysadmins When perf isn t your day job,
Performance Tuning EC2 Instances Brendan Gregg, Performance Engineering, Netflix
PFC306 Performance Tuning EC2 Instances Brendan Gregg, Performance Engineering, Netflix November 12, 2014 Las Vegas, NV 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied,
CS161: Operating Systems
CS161: Operating Systems Matt Welsh [email protected] Lecture 2: OS Structure and System Calls February 6, 2007 1 Lecture Overview Protection Boundaries and Privilege Levels What makes the kernel different
Zing Vision. Answering your toughest production Java performance questions
Zing Vision Answering your toughest production Java performance questions Outline What is Zing Vision? Where does Zing Vision fit in your Java environment? Key features How it works Using ZVRobot Q & A
language 1 (source) compiler language 2 (target) Figure 1: Compiling a program
CS 2112 Lecture 27 Interpreters, compilers, and the Java Virtual Machine 1 May 2012 Lecturer: Andrew Myers 1 Interpreters vs. compilers There are two strategies for obtaining runnable code from a program
Perf for User Space Program Analysis
Perf for User Space Program Analysis 30 May, 2013 Tetsuo Takata, Platform Solutions Business Unit, System Platforms Sector NTT DATA CORPORATION 2 Agenda 1. Introduction 2. What is perf? 3. Usage and usecases
Realtime Linux Kernel Features
Realtime Linux Kernel Features Tim Burke, Red Hat, Director Emerging Technologies Special guest appearance, Ted Tso of IBM Realtime what does it mean to you? Agenda What? Terminology, Target capabilities
Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture
Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts
Software Tracing of Embedded Linux Systems using LTTng and Tracealyzer. Dr. Johan Kraft, Percepio AB
Software Tracing of Embedded Linux Systems using LTTng and Tracealyzer Dr. Johan Kraft, Percepio AB Debugging embedded software can be a challenging, time-consuming and unpredictable factor in development
Xen and XenServer Storage Performance
Xen and XenServer Storage Performance Low Latency Virtualisation Challenges Dr Felipe Franciosi XenServer Engineering Performance Team e-mail: [email protected] freenode: felipef #xen-api twitter:
Monitoring, Tracing, Debugging (Under Construction)
Monitoring, Tracing, Debugging (Under Construction) I was already tempted to drop this topic from my lecture on operating systems when I found Stephan Siemen's article "Top Speed" in Linux World 10/2003.
From Clouds to Roots. Brendan Gregg. Senior Performance Architect Performance Engineering Team. September, 2014. bgregg@ne5lix.
From Clouds to Roots Brendan Gregg Senior Performance Architect Performance Engineering Team [email protected], @brendangregg September, 2014 Root Cause Analysis at Ne5lix Devices Zuul Load Instances (Linux)
Visualizing gem5 via ARM DS-5 Streamline. Dam Sunwoo ([email protected]) ARM R&D December 2012
Visualizing gem5 via ARM DS-5 Streamline Dam Sunwoo ([email protected]) ARM R&D December 2012 1 The Challenge! System-level research and performance analysis becoming ever so complicated! More cores and
Linux Tools for Monitoring and Performance. Khalid Baheyeldin November 2009 KWLUG http://2bits.com
Linux Tools for Monitoring and Performance Khalid Baheyeldin November 2009 KWLUG http://2bits.com Agenda Introduction Definitions Tools, with demos Focus on command line, servers, web Exclude GUI tools
White Paper. Real-time Capabilities for Linux SGI REACT Real-Time for Linux
White Paper Real-time Capabilities for Linux SGI REACT Real-Time for Linux Abstract This white paper describes the real-time capabilities provided by SGI REACT Real-Time for Linux. software. REACT enables
Delivering Quality in Software Performance and Scalability Testing
Delivering Quality in Software Performance and Scalability Testing Abstract Khun Ban, Robert Scott, Kingsum Chow, and Huijun Yan Software and Services Group, Intel Corporation {khun.ban, robert.l.scott,
Jonathan Worthington Scarborough Linux User Group
Jonathan Worthington Scarborough Linux User Group Introduction What does a Virtual Machine do? Hides away the details of the hardware platform and operating system. Defines a common set of instructions.
ELEC 377. Operating Systems. Week 1 Class 3
Operating Systems Week 1 Class 3 Last Class! Computer System Structure, Controllers! Interrupts & Traps! I/O structure and device queues.! Storage Structure & Caching! Hardware Protection! Dual Mode Operation
Monitoring and Managing a JVM
Monitoring and Managing a JVM Erik Brakkee & Peter van den Berkmortel Overview About Axxerion Challenges and example Troubleshooting Memory management Tooling Best practices Conclusion About Axxerion Axxerion
Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters
Interpreters and virtual machines Michel Schinz 2007 03 23 Interpreters Interpreters Why interpreters? An interpreter is a program that executes another program, represented as some kind of data-structure.
Kernel Optimizations for KVM. Rik van Riel Senior Software Engineer, Red Hat June 25 2010
Kernel Optimizations for KVM Rik van Riel Senior Software Engineer, Red Hat June 25 2010 Kernel Optimizations for KVM What is virtualization performance? Benefits of developing both guest and host KVM
11.1 inspectit. 11.1. inspectit
11.1. inspectit Figure 11.1. Overview on the inspectit components [Siegl and Bouillet 2011] 11.1 inspectit The inspectit monitoring tool (website: http://www.inspectit.eu/) has been developed by NovaTec.
Keil C51 Cross Compiler
Keil C51 Cross Compiler ANSI C Compiler Generates fast compact code for the 8051 and it s derivatives Advantages of C over Assembler Do not need to know the microcontroller instruction set Register allocation
Performance Optimization Guide
Performance Optimization Guide Publication Date: July 06, 2016 Copyright Metalogix International GmbH, 2001-2016. All Rights Reserved. This software is protected by copyright law and international treaties.
Real-time KVM from the ground up
Real-time KVM from the ground up KVM Forum 2015 Rik van Riel Red Hat Real-time KVM What is real time? Hardware pitfalls Realtime preempt Linux kernel patch set KVM & qemu pitfalls KVM configuration Scheduling
Windows Server 2008 R2 Hyper V. Public FAQ
Windows Server 2008 R2 Hyper V Public FAQ Contents New Functionality in Windows Server 2008 R2 Hyper V...3 Windows Server 2008 R2 Hyper V Questions...4 Clustering and Live Migration...5 Supported Guests...6
Dynamic Behavior Analysis Using Binary Instrumentation
Dynamic Behavior Analysis Using Binary Instrumentation Jonathan Salwan [email protected] St'Hack Bordeaux France March 27 2015 Keywords: program analysis, DBI, DBA, Pin, concrete execution, symbolic
Using Power to Improve C Programming Education
Using Power to Improve C Programming Education Jonas Skeppstedt Department of Computer Science Lund University Lund, Sweden [email protected] jonasskeppstedt.net jonasskeppstedt.net [email protected]
IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Hyper-V Server Agent Version 6.3.1 Fix Pack 2.
IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Hyper-V Server Agent Version 6.3.1 Fix Pack 2 Reference IBM Tivoli Composite Application Manager for Microsoft Applications:
Hotpatching and the Rise of Third-Party Patches
Hotpatching and the Rise of Third-Party Patches Alexander Sotirov [email protected] BlackHat USA 2006 Overview In the next one hour, we will cover: Third-party security patches _ recent developments
An Oracle White Paper September 2013. Advanced Java Diagnostics and Monitoring Without Performance Overhead
An Oracle White Paper September 2013 Advanced Java Diagnostics and Monitoring Without Performance Overhead Introduction... 1 Non-Intrusive Profiling and Diagnostics... 2 JMX Console... 2 Java Flight Recorder...
Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat
Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower The traditional approach better performance Why computers are
Red Hat Linux Internals
Red Hat Linux Internals Learn how the Linux kernel functions and start developing modules. Red Hat Linux internals teaches you all the fundamental requirements necessary to understand and start developing
General Introduction
Managed Runtime Technology: General Introduction Xiao-Feng Li ([email protected]) 2012-10-10 Agenda Virtual machines Managed runtime systems EE and MM (JIT and GC) Summary 10/10/2012 Managed Runtime
Virtualization. Clothing the Wolf in Wool. Wednesday, April 17, 13
Virtualization Clothing the Wolf in Wool Virtual Machines Began in 1960s with IBM and MIT Project MAC Also called open shop operating systems Present user with the view of a bare machine Execute most instructions
Laboratory Report. An Appendix to SELinux & grsecurity: A Side-by-Side Comparison of Mandatory Access Control & Access Control List Implementations
Laboratory Report An Appendix to SELinux & grsecurity: A Side-by-Side Comparison of Mandatory Access Control & Access Control List Implementations 1. Hardware Configuration We configured our testbed on
Tutorial: Load Testing with CLIF
Tutorial: Load Testing with CLIF Bruno Dillenseger, Orange Labs Learning the basic concepts and manipulation of the CLIF load testing platform. Focus on the Eclipse-based GUI. Menu Introduction about Load
Large-scale performance monitoring framework for cloud monitoring. Live Trace Reading and Processing
Large-scale performance monitoring framework for cloud monitoring Live Trace Reading and Processing Julien Desfossez Michel Dagenais May 2014 École Polytechnique de Montreal Live Trace Reading Read the
Gary Frost AMD Java Labs [email protected]
Analyzing Java Performance Using Hardware Performance Counters Gary Frost AMD Java Labs [email protected] 2008 by AMD; made available under the EPL v1.0 2 Agenda AMD Java Labs Hardware Performance Counters
What s Cool in the SAP JVM (CON3243)
What s Cool in the SAP JVM (CON3243) Volker Simonis, SAP SE September, 2014 Public Agenda SAP JVM Supportability SAP JVM Profiler SAP JVM Debugger 2014 SAP SE. All rights reserved. Public 2 SAP JVM SAP
Linux Virtualization. Kir Kolyshkin <[email protected]> OpenVZ project manager
Linux Virtualization Kir Kolyshkin OpenVZ project manager What is virtualization? Virtualization is a technique for deploying technologies. Virtualization creates a level of indirection
Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.
Is your database application experiencing poor response time, scalability problems, and too many deadlocks or poor application performance? One or a combination of zparms, database design and application
Review from last time. CS 537 Lecture 3 OS Structure. OS structure. What you should learn from this lecture
Review from last time CS 537 Lecture 3 OS Structure What HW structures are used by the OS? What is a system call? Michael Swift Remzi Arpaci-Dussea, Michael Swift 1 Remzi Arpaci-Dussea, Michael Swift 2
LinuxCon Europe 2013. Cloud Monitoring and Distribution Bug Reporting with Live Streaming and Snapshots. mathieu.desnoyers@efficios.
LinuxCon Europe 2013 Cloud Monitoring and Distribution Bug Reporting with Live Streaming and Snapshots [email protected] 1 Presenter Mathieu Desnoyers http://www.efficios.com Author/Maintainer
PHP on IBM i: What s New with Zend Server 5 for IBM i
PHP on IBM i: What s New with Zend Server 5 for IBM i Mike Pavlak Solutions Consultant [email protected] (815) 722 3454 Function Junction Audience Used PHP in Zend Core/Platform New to Zend PHP Looking to
OS Observability Tools
OS Observability Tools Classic tools and their limitations DTrace (Solaris) SystemTAP (Linux) Slide 1 Where we're going with this... Know about OS observation tools See some examples how to use existing
ò Paper reading assigned for next Thursday ò Lab 2 due next Friday ò What is cooperative multitasking? ò What is preemptive multitasking?
Housekeeping Paper reading assigned for next Thursday Scheduling Lab 2 due next Friday Don Porter CSE 506 Lecture goals Undergrad review Understand low-level building blocks of a scheduler Understand competing
Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment
Red Hat Network Satellite Management and automation of your Red Hat Enterprise Linux environment WHAT IS IT? Red Hat Network (RHN) Satellite server is an easy-to-use, advanced systems management platform
Pete s All Things Sun: Comparing Solaris to RedHat Enterprise and AIX Virtualization Features
Pete s All Things Sun: Comparing Solaris to RedHat Enterprise and AIX Virtualization Features PETER BAER GALVIN Peter Baer Galvin is the chief technologist for Corporate Technologies, a premier systems
Virtual Machine Monitors. Dr. Marc E. Fiuczynski Research Scholar Princeton University
Virtual Machine Monitors Dr. Marc E. Fiuczynski Research Scholar Princeton University Introduction Have been around since 1960 s on mainframes used for multitasking Good example VM/370 Have resurfaced
Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment
Red Hat Satellite Management and automation of your Red Hat Enterprise Linux environment WHAT IS IT? Red Hat Satellite server is an easy-to-use, advanced systems management platform for your Linux infrastructure.
Introduction to Git. Markus Kötter [email protected]. Notes. Leinelab Workshop July 28, 2015
Introduction to Git Markus Kötter [email protected] Leinelab Workshop July 28, 2015 Motivation - Why use version control? Versions in file names: does this look familiar? $ ls file file.2 file.
Real-time Debugging using GDB Tracepoints and other Eclipse features
Real-time Debugging using GDB Tracepoints and other Eclipse features GCC Summit 2010 2010-010-26 [email protected] Summary Introduction Advanced debugging features Non-stop multi-threaded debugging
Trace-Based and Sample-Based Profiling in Rational Application Developer
Trace-Based and Sample-Based Profiling in Rational Application Developer This document is aimed at highlighting the importance of profiling in software development and talks about the profiling tools offered
VEEAM ONE 8 RELEASE NOTES
VEEAM ONE 8 RELEASE NOTES This Release Notes document provides last-minute information about Veeam ONE 8 Update 2, including system requirements, installation instructions as well as relevant information
More on Pipelining and Pipelines in Real Machines CS 333 Fall 2006 Main Ideas Data Hazards RAW WAR WAW More pipeline stall reduction techniques Branch prediction» static» dynamic bimodal branch prediction
KVM & Memory Management Updates
KVM & Memory Management Updates KVM Forum 2012 Rik van Riel Red Hat, Inc. KVM & Memory Management Updates EPT Accessed & Dirty Bits 1GB hugepages Balloon vs. Transparent Huge Pages Automatic NUMA Placement
This presentation explains how to monitor memory consumption of DataStage processes during run time.
This presentation explains how to monitor memory consumption of DataStage processes during run time. Page 1 of 9 The objectives of this presentation are to explain why and when it is useful to monitor
Oak Ridge National Laboratory Computing and Computational Sciences Directorate. Lustre Crash Dumps And Log Files
Oak Ridge National Laboratory Computing and Computational Sciences Directorate Lustre Crash Dumps And Log Files Jesse Hanley Rick Mohr Sarp Oral Michael Brim Nathan Grodowitz Gregory Koenig Jason Hill
Pushing the Limits of Windows: Physical Memory Mark Russinovich (From Mark Russinovich Blog)
This is the first blog post in a series I'll write over the coming months called Pushing the Limits of Windows that describes how Windows and applications use a particular resource, the licensing and implementation-derived
Java Troubleshooting and Performance
Java Troubleshooting and Performance Margus Pala Java Fundamentals 08.12.2014 Agenda Debugger Thread dumps Memory dumps Crash dumps Tools/profilers Rules of (performance) optimization 1. Don't optimize
A Study of Performance Monitoring Unit, perf and perf_events subsystem
A Study of Performance Monitoring Unit, perf and perf_events subsystem Team Aman Singh Anup Buchke Mentor Dr. Yann-Hang Lee Summary Performance Monitoring Unit, or the PMU, is found in all high end processors
IBM Tivoli Monitoring Version 6.3 Fix Pack 2. Infrastructure Management Dashboards for Servers Reference
IBM Tivoli Monitoring Version 6.3 Fix Pack 2 Infrastructure Management Dashboards for Servers Reference IBM Tivoli Monitoring Version 6.3 Fix Pack 2 Infrastructure Management Dashboards for Servers Reference
Virtualization in Linux
Virtualization in Linux Kirill Kolyshkin September 1, 2006 Abstract Three main virtualization approaches emulation, paravirtualization, and operating system-level virtualization are covered,
CSC 2405: Computer Systems II
CSC 2405: Computer Systems II Spring 2013 (TR 8:30-9:45 in G86) Mirela Damian http://www.csc.villanova.edu/~mdamian/csc2405/ Introductions Mirela Damian Room 167A in the Mendel Science Building [email protected]
Virtual Machines. COMP 3361: Operating Systems I Winter 2015 http://www.cs.du.edu/3361
s COMP 3361: Operating Systems I Winter 2015 http://www.cs.du.edu/3361 1 Virtualization! Create illusion of multiple machines on the same physical hardware! Single computer hosts multiple virtual machines
Virtualization in Linux KVM + QEMU
CS695 Topics in Virtualization and Cloud Computing KVM + QEMU Senthil, Puru, Prateek and Shashank 1 Topics covered KVM and QEMU Architecture VTx support CPU virtualization in KMV Memory virtualization
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,
PARALLELS CLOUD SERVER
PARALLELS CLOUD SERVER Performance and Scalability 1 Table of Contents Executive Summary... Error! Bookmark not defined. LAMP Stack Performance Evaluation... Error! Bookmark not defined. Background...
TPC-W * : Benchmarking An Ecommerce Solution By Wayne D. Smith, Intel Corporation Revision 1.2
TPC-W * : Benchmarking An Ecommerce Solution By Wayne D. Smith, Intel Corporation Revision 1.2 1 INTRODUCTION How does one determine server performance and price/performance for an Internet commerce, Ecommerce,
PARALLELS SERVER BARE METAL 5.0 README
PARALLELS SERVER BARE METAL 5.0 README 1999-2011 Parallels Holdings, Ltd. and its affiliates. All rights reserved. This document provides the first-priority information on the Parallels Server Bare Metal
VMware Server 2.0 Essentials. Virtualization Deployment and Management
VMware Server 2.0 Essentials Virtualization Deployment and Management . This PDF is provided for personal use only. Unauthorized use, reproduction and/or distribution strictly prohibited. All rights reserved.
Debugging Java performance problems. Ryan Matteson [email protected] http://prefetch.net
Debugging Java performance problems Ryan Matteson [email protected] http://prefetch.net Overview Tonight I am going to discuss Java performance, and how opensource tools can be used to debug performance
Cloud Operating Systems for Servers
Cloud Operating Systems for Servers Mike Day Distinguished Engineer, Virtualization and Linux August 20, 2014 [email protected] 1 What Makes a Good Cloud Operating System?! Consumes Few Resources! Fast
Enterprise Manager Performance Tips
Enterprise Manager Performance Tips + The tips below are related to common situations customers experience when their Enterprise Manager(s) are not performing consistent with performance goals. If you
Using Process Monitor
Using Process Monitor Process Monitor Tutorial This information was adapted from the help file for the program. Process Monitor is an advanced monitoring tool for Windows that shows real time file system,
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers Happiest People. Happiest Customers Contents Abstract... 3 Introduction... 3 Performance Comparison... 4 Architecture... 5 Diagram...
Network Infrastructure Services CS848 Project
Quality of Service Guarantees for Cloud Services CS848 Project presentation by Alexey Karyakin David R. Cheriton School of Computer Science University of Waterloo March 2010 Outline 1. Performance of cloud
Virtualization Technologies
12 January 2010 Virtualization Technologies Alex Landau ([email protected]) IBM Haifa Research Lab What is virtualization? Virtualization is way to run multiple operating systems and user applications on
