System/Networking performance analytics with perf. Hannes Frederic Sowa
|
|
|
- Susanna Walker
- 9 years ago
- Views:
Transcription
1 System/Networking performance analytics with perf Hannes Frederic Sowa
2 Prerequisites Recent Linux Kernel CONFIG_PERF_* CONFIG_DEBUG_INFO Fedora: debuginfo-install kernel for vmlinux Dwarves package for pahole
3 Debuginfo BuildID is mostly a SHA-1 checksum which gets placed into its own ELF section Mapping from buildid to binary via /usr/lib/debug/.build-id/xx/xxxxxxxxxxx or ~/.debug/.build-id/xx/xxxxxxxxxx
4 Basic usage of perf perf top perf stat count events happening during specific workload perf record generate perf.data file in current directory with samples of the measurements Event descriptor use 'perf list' to view possible events perf evlist extracts perf events from perf.data
5 perf trace Live tracing Replacement of strace Much higher performance, as we don't need to do multiple kernel user space transitions anymore Like strace: perf trace -e read,write <program>
6 Analyzing data perf report GUI to browse, inspect and annotate samples Can inspect call graphs perf script Without arguments presents easy to grep data Can later on be used to process perf data with perl or python perf annotate Source listing with annotated performance profiles
7 Transferring perf data perf archive perf.data Now please run: $ tar xvf perf.data.tar.bz2 -C ~/.debug wherever you need to run 'perf report' on.
8 !root access to perf /proc/sys/kernel/perf_event_paranoid: The perf_event_paranoid file can be set to restrict access to the performance counters. 2 only allow user-space measurements. 1 allow both kernel and user measurements (default). 0 allow access to CPU-specific data but not raw tracepoint samples. -1 no restrictions. echo -1 > /proc/sys/kernel/perf_event_paranoid
9 Reasoning about performance Algorithmic complexity Memory access behaviour Parallelism Cross cpu Memory access Assumptions on networking traffic Raw instruction throughput Caching Fast paths e.g. CPI (cycles per instructoin)
10 perf_event_open open fd to measure one particular event Sampling Data mostly gathered via mmap Counting Data mostly gathered via read Event types are either Hardware (cycles, instructions,...) Software (cpu clock, context switches) Integration into tracing Caching events Raw Breakpoint events
11 Hardware events perf list hw Sampling based on counters or frequency -c specifies period to sample -F specifies the frequency perf evlist -v shows details about the perf_event_attr, like sample_freq Event / Interrupt is triggered and a sample is captured
12 Reasoning about performance Algorithmic complexity Memory access behaviour Parallelism Cross cpu Memory access Assumptions on networking traffic Raw instruction throughput Caching Fast paths e.g. CPI (cycles per instructoin)
13 Algorithmic complexity How can perf help? Find given workload or benchmark Can easily pinpointed by perf top or simple cycle counting in the kernel: cycles:kpp k enables kernel only counting (u for user space) Additional p modifier change precise level Intel: PEBS Precise Event Based Sampling AMD: IBS Instruction Based Sampling perf top -e cycles:kpp If the region of code is identified proceed with analyzing the source Pinpointing additional recurrences with perf can help, too (see Assumptions on networking traffic ) Mostly needs to be solved by enhancing the algorithms / code e.g. fib trie neatening Find best and worse case and optimize accordingly Sometimes can be worked around by caching See section on Assumptions on networking traffic / patterns
14 Reasoning about performance Algorithmic complexity Memory access behavior Parallelism Cross cpu Memory access Assumptions on networking traffic Raw instruction throughput Caching Fast paths e.g. CPI (cycles per instructoin)
15 Memory access behavior Eric Dumazet's mlx4 memory access optimizations: + /* fetch ring->cons far ahead before needing it to avoid stall */ + ring_cons = ACCESS_ONCE(ring->cons); + /* we want to dirty this cache line once */ + ACCESS_ONCE(ring->last_nr_txbb) = last_nr_txbb; + ACCESS_ONCE(ring->cons) = ring_cons + txbbs_skipped;
16 Memory access behavior II Compilers do not tend to optimize memory access in e.g. large functions and optimize for CPU cache behavior Manual guidance is often needed Pacing memory access across functions to allow CPU to access memory more parallel Avoid RMW instructions
17 Memory access behavior III - struct ordering Code which access large structures tend to write to multiple cache lines Group members of structs so that specific code has to touch the least amount of cache lines Critical Word First / Early Restart CPUs tend to read complete cache lines Early Restart signals the CPU that data is available before complete cache line is read Critical Word First allows the CPU to fetch the wanted data, even it is in the end of a cache line
18 Memory access behavior IV How can perf help? # perf list cache L1-dcache-loads [Hardware cache event] L1-dcache-load-misses [Hardware cache event] L1-dcache-stores [Hardware cache event] L1-dcache-store-misses [Hardware cache event] L1-dcache-prefetch-misses [Hardware cache event] L1-icache-load-misses [Hardware cache event] LLC-loads [Hardware cache event] LLC-stores [Hardware cache event] LLC-prefetches [Hardware cache event] dtlb-loads [Hardware cache event] dtlb-load-misses [Hardware cache event] dtlb-stores [Hardware cache event] dtlb-store-misses [Hardware cache event] itlb-loads [Hardware cache event] itlb-load-misses [Hardware cache event] branch-loads [Hardware cache event] branch-load-misses [Hardware cache event]
19 Memory access behavior V perf mem record <workload> perf mem report uses 'cpu/mem-loads/pp' event by default perf mem -t store record switches to 'cpu/mem-stores/pp' (those kinds of events aren't documented properly: basically you can find them in /sys/devices/cpu/events)
20 Analyzing lock behaviour Needs kernel compiled with lockdep perf lock record <workload> Perf lock report
21 Raw counters Andi Kleen's pmu-utils test suite Offcore events On AMD these are available via perf record -e amd_nb/
22 Reasoning about performance Algorithmic complexity Memory access behaviour Parallelism Cross cpu Memory access Assumptions on networking traffic Raw instruction throughput Caching Fast paths e.g. CPI (cycles per instructoin)
23 Assumptions on networking traffic Examples: xmit_more finally allows the kernel to achieve line rate speeds but the feature must be triggered Receive offloading needs to see packet trains to aggregate sk_buffs Routing caches should not be evicted on every processed data frame Flow-sensitive packet steering should not cause increased cross-cpu memory traffic
24 Kprobes arguments GRP : Group name. If omitted, use "kprobes" for it. EVENT : Event name. If omitted, the event name is generated based on SYM+offs or MEMADDR. MOD : Module name which has given SYM. SYM[+offs] : Symbol+offset where the probe is inserted. MEMADDR : Address where the probe is inserted. FETCHARGS : Arguments. Each probe can have up to 128 args. %REG : Fetch register : Fetch memory at ADDR (ADDR should be in -offs] : Fetch memory at SYM + - offs (SYM should be a data symbol) $stackn : Fetch Nth entry of stack (N >= 0) $stack : Fetch stack address. $retval : Fetch return value.(*) + -offs(fetcharg) : Fetch memory at FETCHARG + - offs address.(**) NAME=FETCHARG : Set NAME as the argument name of FETCHARG. FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types (u8/u16/u32/u64/s8/s16/s32/s64), "string" and bitfield are supported.
25 Examples to pinpoint xmit_more Find an applicable function candidate, guess: perf probe -F filter dev*xmit* We need to get hold on to the xmit_more flag: perf probe -L dev_hard_start_xmit Which variables are available at that location? perf probe -V dev_hard_start_xmit:17 Finally adding the probe point: perf probe -a 'dhsx=dev_hard_start_xmit:17 ifname=dev->name:string xmit_more=next' Record test and view results: perf record -e probe:* -arg <workload> perf script -G
26 Examples to pinpoint xmit_more II Example for use with modules: perf probe -v -m \ /usr/lib/debug/lib/modules/ fc21.x86_64/kernel/net/mac80211/mac80211.ko.debug \ -a 'ieee80211_xmit ifname=skb->dev->name:string xmit_more=skb->xmit_more' Return value probing perf probe -a 'dev_hard_start_xmit%return retval=$retval' User space probing works, too: debuginfo-install <binary> perf probe -x <binary> should give same results
27 Gotchas Lot's of inlining: noinline define in kernel or attribute ((noinline)) Sometimes needs a bit more code rearrangement move code out of header noninline per-object file wrapper EXPORT_SYMBOL Add volatile variables to ease access to certain data Can also be achieved via registers perf probe -a 'dev_hard_start_xmit+332 ifname=dev->name:string more txq' Failed to find the location of more at this address. Perhaps, it has been optimized out. Error: Failed to add events. Narf! So... perf probe -a 'napi_gro_complete ifindex=skb->dev->name:string gro_count=+0x3c(%di):u16' results in: # perf script irq/30-iwlwifi 498 [000] : probe:napi_gro_complete: (ffffffff81649b30) ifindex="wlp3s0" gro_count=0x2 irq/30-iwlwifi 498 [000] : probe:napi_gro_complete: (ffffffff81649b30) ifindex="wlp3s0" gro_count=0x2
28 Raw CPU counter Tables in CPU manuals e.g. cpu/event=xx,umask=xx/flags /sys/bus/event_source/devices/<...> Raw events: r<event><umask> Look up AMD manual to trace amd northbridge events with -e 'amd_nb/event=0xxx,umask=0xxxx/flags'
29 Routing cache expunge print &init_net.ipv6.fib6_sernum 0xffffffff81f1a5ec perf record -e \mem:0xffffffff81f1a5ec:rw ip -6 route add 2002::/64 dev lo
30 perf script Sample scripts available perf script -l Generate script based on perf.data file perf script -g perl/python Run script on perf.data file perf script -s
31 Reasoning about performance Algorithmic complexity Memory access behaviour Parallelism Cross cpu Memory access Assumptions on networking traffic Raw instruction throughput Caching Fast paths e.g. CPI (cycles per instructoin)
32 Raw instruction throughput Mostly compiler and architecture dependent Often marginal effects in performance improvement Still, hot code can benefit a lot, e.g. crypto, hashing CPI cycles per instruction should be increased Mostly dependent on memory accesses Open up missed-optimization in gcc bugzilla? Architecture specific inline assembly?
33 Optimizing instruction throughput Balance code size (L1i cache pressure) vs. improvement Performance decrease in other code possible Even not related at all to the optimized code Mostly case by case optimizations, but some help of tools is possible If needed, provide assembly implementation but add builtin_constant_p wrappers so gcc can still do constant folding if possible
34 Backup: Intel Architecture Code Analyzer
35 Intel Architecture Code Analyzer Static analyzer on assembly code Allows to analyze code for smaller functions in regard to CPU port exhaustion, stalls and latency Reordering instructions Picking different ones / missing optimization in compiler? Does not model memory access Some instructions cannot being modeled correctly, e.g. div Markers mark beginning and end of section to be analyzed:.byte 0x0F, 0x0B movl $222, %ebx movl $111, %ebx.byte 0x64, 0x67, 0x90.byte 0x64, 0x67, 0x90.byte 0x0F, 0x0B
36 Intel Architecture Code Analyzer II Throughput mode (-analysis THROUGHPUT): Throughput Analysis Report Block Throughput: Cycles Throughput Bottleneck: Port1 Port Binding In Cycles Per Iteration: Port 0 - DV D 3 - D Cycles
37 Intel Architecture Code Analyzer III Num Of Ports pressure in cycles Uops 0 - DV D 3 - D * mov rax, rdx cmp rsi, 0x40 0F jb 0x4b sub rsi, 0x sub rsi, 0x40 2^ CP crc32 rax, qword ptr [rdi] 2^ CP crc32 rax, qword ptr [rdi+0x8] 2^ CP crc32 rax, qword ptr [rdi+0x10]
38 Intel Architecture Code Analyzer III Latency Mode (-analysis LATENCY): Latency Analysis Report Latency: 60 Cycles [...] 3 sub rsi, 0x40 4 sub rsi, 0x CP crc32 rax, qword ptr [rdi] 6 1 CP crc32 rax, qword ptr [rdi+0x8] CP crc32 rax, qword ptr [rdi+0x10] CP crc32 rax, qword ptr [rdi+0x18] CP crc32 rax, qword ptr [rdi+0x20] Reports resource conflicts on critical paths (CP) and list of delays
39 Intel Architecture Code Analyzer III 0. mov rax, rdx 44. lea rdi, ptr [rdi+0x2] 42. sub rsi, 0x2 5. crc32 rax, qword ptr [rdi] 1. cmp rsi, 0x40 3. sub rsi, 0x cmp rsi, 0x1 6. crc32 rax, qword ptr [rdi+0x8] 2. jb 0x4b 4. sub rsi, 0x jb 0x7 7. crc32 rax, qword ptr [rdi+0x10] 15. add rsi, 0x jnb 0xffffffffffffffc1 8. crc32 rax, qword ptr [rdi+0x18] 18. sub rsi, 0x cmp rsi, 0x20 9. crc32 rax, qword ptr [rdi+0x20] 26. sub rsi, 0x cmp rsi, 0x jb 0x crc32 rax, qword ptr [rdi+0x28] 32. sub rsi, 0x8 30. cmp rsi, 0x8 25. jb 0x crc32 rax, qword ptr [rdi+0x30] 37. sub rsi, 0x4 35. cmp rsi, 0x4 31. jb 0x lea rdi, ptr [rdi+0x40] 12. crc32 rax, qword ptr [rdi+0x38] 40. cmp rsi, 0x2 36. jb 0xf 19. crc32 rax, qword ptr [rdi] 41. jb 0x crc32 rax, qword ptr [rdi+0x8] 21. crc32 rax, qword ptr [rdi+0x10] 23. lea rdi, ptr [rdi+0x20] 22. crc32 rax, qword ptr [rdi+0x18] 27. crc32 rax, qword ptr [rdi] 28. crc32 rax, qword ptr [rdi+0x8] 29. lea rdi, ptr [rdi+0x10] 33. crc32 rax, qword ptr [rdi] 34. lea rdi, ptr [rdi+0x8] 38. crc32 eax, dword ptr [rdi] 39. lea rdi, ptr [rdi+0x4] 43. crc32 eax, word ptr [rdi] 47. crc32 eax, byte ptr [rdi]
40 Thanks! Questions?
Perf Tool: Performance Analysis Tool for Linux
/ Notes on Linux perf tool Intended audience: Those who would like to learn more about Linux perf performance analysis and profiling tool. Used: CPE 631 Advanced Computer Systems and Architectures CPE
<Insert Picture Here> Tracing on Linux: the Old, the New, and the Ugly
Tracing on Linux: the Old, the New, and the Ugly Elena Zannoni ([email protected]) Linux Engineering, Oracle America October 27 2011 Categories of Tracing Tools Kernel Tracing
<Insert Picture Here> Tracing on Linux
Tracing on Linux Elena Zannoni ([email protected]) Linux Engineering, Oracle America November 6 2012 The Tree of Tracing SystemTap LTTng perf DTrace ftrace GDB TRACE_EVENT
64-Bit NASM Notes. Invoking 64-Bit NASM
64-Bit NASM Notes The transition from 32- to 64-bit architectures is no joke, as anyone who has wrestled with 32/64 bit incompatibilities will attest We note here some key differences between 32- and 64-bit
Real-time Debugging using GDB Tracepoints and other Eclipse features
Real-time Debugging using GDB Tracepoints and other Eclipse features GCC Summit 2010 2010-010-26 [email protected] Summary Introduction Advanced debugging features Non-stop multi-threaded debugging
Removing The Linux Routing Cache
Removing The Red Hat Inc. Columbia University, New York, 2012 Removing The 1 Linux Maintainership 2 3 4 5 Removing The My Background Started working on the kernel 18+ years ago. First project: helping
Dynamic Behavior Analysis Using Binary Instrumentation
Dynamic Behavior Analysis Using Binary Instrumentation Jonathan Salwan [email protected] St'Hack Bordeaux France March 27 2015 Keywords: program analysis, DBI, DBA, Pin, concrete execution, symbolic
A Study of Performance Monitoring Unit, perf and perf_events subsystem
A Study of Performance Monitoring Unit, perf and perf_events subsystem Team Aman Singh Anup Buchke Mentor Dr. Yann-Hang Lee Summary Performance Monitoring Unit, or the PMU, is found in all high end processors
Multi-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
Red Hat Linux Internals
Red Hat Linux Internals Learn how the Linux kernel functions and start developing modules. Red Hat Linux internals teaches you all the fundamental requirements necessary to understand and start developing
Automating Linux Malware Analysis Using Limon Sandbox Monnappa K A [email protected]
Automating Linux Malware Analysis Using Limon Sandbox Monnappa K A [email protected] A number of devices are running Linux due to its flexibility and open source nature. This has made Linux platform
GPU Profiling with AMD CodeXL
GPU Profiling with AMD CodeXL Software Profiling Course Hannes Würfel OUTLINE 1. Motivation 2. GPU Recap 3. OpenCL 4. CodeXL Overview 5. CodeXL Internals 6. CodeXL Profiling 7. CodeXL Debugging 8. Sources
AXI Performance Monitor v5.0
AXI Performance Monitor v5.0 LogiCORE IP Product Guide Vivado Design Suite Table of Contents IP Facts Chapter 1: Overview Advanced Mode...................................................................
Performance Monitoring of the Software Frameworks for LHC Experiments
Proceedings of the First EELA-2 Conference R. mayo et al. (Eds.) CIEMAT 2009 2009 The authors. All rights reserved Performance Monitoring of the Software Frameworks for LHC Experiments William A. Romero
PERFORMANCE TOOLS DEVELOPMENTS
PERFORMANCE TOOLS DEVELOPMENTS Roberto A. Vitillo presented by Paolo Calafiura & Wim Lavrijsen Lawrence Berkeley National Laboratory Future computing in particle physics, 16 June 2011 1 LINUX PERFORMANCE
Performance Counters on Linux
Performance Counters on Linux The New Tools Linux Plumbers Conference, September, 2009 Arnaldo Carvalho de Melo [email protected] How did I get involved?. I am no specialist on performance counters. pahole
Hacking Techniques & Intrusion Detection. Ali Al-Shemery arabnix [at] gmail
Hacking Techniques & Intrusion Detection Ali Al-Shemery arabnix [at] gmail All materials is licensed under a Creative Commons Share Alike license http://creativecommonsorg/licenses/by-sa/30/ # whoami Ali
RTOS Debugger for ecos
RTOS Debugger for ecos TRACE32 Online Help TRACE32 Directory TRACE32 Index TRACE32 Documents... RTOS Debugger... RTOS Debugger for ecos... 1 Overview... 2 Brief Overview of Documents for New Users... 3
Accelerating High-Speed Networking with Intel I/O Acceleration Technology
White Paper Intel I/O Acceleration Technology Accelerating High-Speed Networking with Intel I/O Acceleration Technology The emergence of multi-gigabit Ethernet allows data centers to adapt to the increasing
Router Architectures
Router Architectures An overview of router architectures. Introduction What is a Packet Switch? Basic Architectural Components Some Example Packet Switches The Evolution of IP Routers 2 1 Router Components
Getting Started with CodeXL
AMD Developer Tools Team Advanced Micro Devices, Inc. Table of Contents Introduction... 2 Install CodeXL... 2 Validate CodeXL installation... 3 CodeXL help... 5 Run the Teapot Sample project... 5 Basic
A way towards Lower Latency and Jitter
A way towards Lower Latency and Jitter Jesse Brandeburg [email protected] Intel Ethernet BIO Jesse Brandeburg A senior Linux developer in the Intel LAN Access Division,
More on Pipelining and Pipelines in Real Machines CS 333 Fall 2006 Main Ideas Data Hazards RAW WAR WAW More pipeline stall reduction techniques Branch prediction» static» dynamic bimodal branch prediction
Debugging & Profiling with Open Source SW Tools
Debugging & Profiling with Open Source SW Tools Ivan Giro*o igiro*[email protected] Informa(on & Communica(on Technology Sec(on (ICTS) Interna(onal Centre for Theore(cal Physics (ICTP) What is Debugging?! Iden(fying
Network Administration and Monitoring
Network Administration and Monitoring Alessandro Barenghi Dipartimento di Elettronica, Informazione e Bioingengeria Politecnico di Milano barenghi - at - elet.polimi.it April 17, 2013 Recap What did we
Keil C51 Cross Compiler
Keil C51 Cross Compiler ANSI C Compiler Generates fast compact code for the 8051 and it s derivatives Advantages of C over Assembler Do not need to know the microcontroller instruction set Register allocation
Hardware performance monitoring. Zoltán Majó
Hardware performance monitoring Zoltán Majó 1 Question Did you take any of these lectures: Computer Architecture and System Programming How to Write Fast Numerical Code Design of Parallel and High Performance
Operating Systems Design 16. Networking: Sockets
Operating Systems Design 16. Networking: Sockets Paul Krzyzanowski [email protected] 1 Sockets IP lets us send data between machines TCP & UDP are transport layer protocols Contain port number to identify
Welcome to the Dawn of Open-Source Networking. Linux IP Routers Bob Gilligan [email protected]
Welcome to the Dawn of Open-Source Networking. Linux IP Routers Bob Gilligan [email protected] Outline About Vyatta: Open source project, and software product Areas we re working on or interested in
Gary Frost AMD Java Labs [email protected]
Analyzing Java Performance Using Hardware Performance Counters Gary Frost AMD Java Labs [email protected] 2008 by AMD; made available under the EPL v1.0 2 Agenda AMD Java Labs Hardware Performance Counters
Introduction. Figure 1 Schema of DarunGrim2
Reversing Microsoft patches to reveal vulnerable code Harsimran Walia Computer Security Enthusiast 2011 Abstract The paper would try to reveal the vulnerable code for a particular disclosed vulnerability,
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers
A Comparative Study on Vega-HTTP & Popular Open-source Web-servers Happiest People. Happiest Customers Contents Abstract... 3 Introduction... 3 Performance Comparison... 4 Architecture... 5 Diagram...
Performance of Software Switching
Performance of Software Switching Based on papers in IEEE HPSR 2011 and IFIP/ACM Performance 2011 Nuutti Varis, Jukka Manner Department of Communications and Networking (COMNET) Agenda Motivation Performance
Intel DPDK Boosts Server Appliance Performance White Paper
Intel DPDK Boosts Server Appliance Performance Intel DPDK Boosts Server Appliance Performance Introduction As network speeds increase to 40G and above, both in the enterprise and data center, the bottlenecks
National CR16C Family On-Chip Emulation. Contents. Technical Notes V9.11.75
_ V9.11.75 Technical Notes National CR16C Family On-Chip Emulation Contents Contents... 1 1 Introduction... 2 2 Emulation options... 3 2.1 Hardware Options... 3 2.2 Initialization Sequence... 4 2.3 JTAG
OS Observability Tools
OS Observability Tools Classic tools and their limitations DTrace (Solaris) SystemTAP (Linux) Slide 1 Where we're going with this... Know about OS observation tools See some examples how to use existing
Locating Cache Performance Bottlenecks Using Data Profiling
Locating Cache Performance Bottlenecks Using Data Profiling Aleksey Pesterev Nickolai Zeldovich Robert T. Morris Massachusetts Institute of Technology Computer Science and Artificial Intelligence Lab {alekseyp,
Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX
Overview CISC Developments Over Twenty Years Classic CISC design: Digital VAX VAXÕs RISC successor: PRISM/Alpha IntelÕs ubiquitous 80x86 architecture Ð 8086 through the Pentium Pro (P6) RJS 2/3/97 Philosophy
Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality
Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality Heechul Yun +, Gang Yao +, Rodolfo Pellizzoni *, Marco Caccamo +, Lui Sha + University of Illinois at Urbana and Champaign
Introduction. Part I: Finding Bottlenecks when Something s Wrong. Chapter 1: Performance Tuning 3
Wort ftoc.tex V3-12/17/2007 2:00pm Page ix Introduction xix Part I: Finding Bottlenecks when Something s Wrong Chapter 1: Performance Tuning 3 Art or Science? 3 The Science of Performance Tuning 4 The
Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking
Assessing the Performance of Virtualization Technologies for NFV: a Preliminary Benchmarking Roberto Bonafiglia, Ivano Cerrato, Francesco Ciaccia, Mario Nemirovsky, Fulvio Risso Politecnico di Torino,
Andreas Herrmann. AMD Operating System Research Center
Myth and facts about 64-bit Linux Andreas Herrmann André Przywara AMD Operating System Research Center March 2nd, 2008 Myths... You don't need 64-bit software with less than 3 GB RAM. There are less drivers
DSS. Diskpool and cloud storage benchmarks used in IT-DSS. Data & Storage Services. Geoffray ADDE
DSS Data & Diskpool and cloud storage benchmarks used in IT-DSS CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/it Geoffray ADDE DSS Outline I- A rational approach to storage systems evaluation
C-GEP 100 Monitoring application user manual
C-GEP 100 Monitoring application user manual 1 Introduction: C-GEP is a very versatile platform for network monitoring applications. The ever growing need for network bandwith like HD video streaming and
Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat
Why Computers Are Getting Slower (and what we can do about it) Rik van Riel Sr. Software Engineer, Red Hat Why Computers Are Getting Slower The traditional approach better performance Why computers are
ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy
ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy OVERVIEW The global communication and the continuous growth of services provided through the Internet or local infrastructure require to
ZEN LOAD BALANCER EE v3.02 DATASHEET The Load Balancing made easy
ZEN LOAD BALANCER EE v3.02 DATASHEET The Load Balancing made easy OVERVIEW The global communication and the continuous growth of services provided through the Internet or local infrastructure require to
Tue Apr 19 11:03:19 PDT 2005 by Andrew Gristina thanks to Luca Deri and the ntop team
Tue Apr 19 11:03:19 PDT 2005 by Andrew Gristina thanks to Luca Deri and the ntop team This document specifically addresses a subset of interesting netflow export situations to an ntop netflow collector
LogiCORE IP AXI Performance Monitor v2.00.a
LogiCORE IP AXI Performance Monitor v2.00.a Product Guide Table of Contents IP Facts Chapter 1: Overview Target Technology................................................................. 9 Applications......................................................................
OS Virtualization Frank Hofmann
OS Virtualization Frank Hofmann OP/N1 Released Products Engineering Sun Microsystems UK Overview Different approaches to virtualization > Compartmentalization > System Personalities > Virtual Machines
Chapter 11 I/O Management and Disk Scheduling
Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 11 I/O Management and Disk Scheduling Dave Bremer Otago Polytechnic, NZ 2008, Prentice Hall I/O Devices Roadmap Organization
Intel Ethernet and Configuring Single Root I/O Virtualization (SR-IOV) on Microsoft* Windows* Server 2012 Hyper-V. Technical Brief v1.
Intel Ethernet and Configuring Single Root I/O Virtualization (SR-IOV) on Microsoft* Windows* Server 2012 Hyper-V Technical Brief v1.0 September 2012 2 Intel Ethernet and Configuring SR-IOV on Windows*
Visualizing gem5 via ARM DS-5 Streamline. Dam Sunwoo ([email protected]) ARM R&D December 2012
Visualizing gem5 via ARM DS-5 Streamline Dam Sunwoo ([email protected]) ARM R&D December 2012 1 The Challenge! System-level research and performance analysis becoming ever so complicated! More cores and
find model parameters, to validate models, and to develop inputs for models. c 1994 Raj Jain 7.1
Monitors Monitor: A tool used to observe the activities on a system. Usage: A system programmer may use a monitor to improve software performance. Find frequently used segments of the software. A systems
Optimizing TCP Forwarding
Optimizing TCP Forwarding Vsevolod V. Panteleenko and Vincent W. Freeh TR-2-3 Department of Computer Science and Engineering University of Notre Dame Notre Dame, IN 46556 {vvp, vin}@cse.nd.edu Abstract
Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com
CSCI-UA.0201-003 Computer Systems Organization Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com Some slides adapted (and slightly modified)
Virtualization: TCP/IP Performance Management in a Virtualized Environment Orlando Share Session 9308
Virtualization: TCP/IP Performance Management in a Virtualized Environment Orlando Share Session 9308 Laura Knapp WW Business Consultant [email protected] Applied Expert Systems, Inc. 2011 1 Background
Operating Systems. Virtual Memory
Operating Systems Virtual Memory Virtual Memory Topics. Memory Hierarchy. Why Virtual Memory. Virtual Memory Issues. Virtual Memory Solutions. Locality of Reference. Virtual Memory with Segmentation. Page
Off-by-One exploitation tutorial
Off-by-One exploitation tutorial By Saif El-Sherei www.elsherei.com Introduction: I decided to get a bit more into Linux exploitation, so I thought it would be nice if I document this as a good friend
Java Troubleshooting and Performance
Java Troubleshooting and Performance Margus Pala Java Fundamentals 08.12.2014 Agenda Debugger Thread dumps Memory dumps Crash dumps Tools/profilers Rules of (performance) optimization 1. Don't optimize
CS61: Systems Programing and Machine Organization
CS61: Systems Programing and Machine Organization Fall 2009 Section Notes for Week 2 (September 14 th - 18 th ) Topics to be covered: I. Binary Basics II. Signed Numbers III. Architecture Overview IV.
Chapter 3 Operating-System Structures
Contents 1. Introduction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threads 6. CPU Scheduling 7. Process Synchronization 8. Deadlocks 9. Memory Management 10. Virtual
Return-oriented programming without returns
Faculty of Computer Science Institute for System Architecture, Operating Systems Group Return-oriented programming without urns S. Checkoway, L. Davi, A. Dmitrienko, A. Sadeghi, H. Shacham, M. Winandy
W4118 Operating Systems. Junfeng Yang
W4118 Operating Systems Junfeng Yang Outline Linux overview Interrupt in Linux System call in Linux What is Linux A modern, open-source OS, based on UNIX standards 1991, 0.1 MLOC, single developer Linus
Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University [email protected].
Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉 志 尉 National Chiao Tung University [email protected] Review Computers in mid 50 s Hardware was expensive
WHITE PAPER. ClusterWorX 2.1 from Linux NetworX. Cluster Management Solution C ONTENTS INTRODUCTION
WHITE PAPER A PRIL 2002 C ONTENTS Introduction 1 Overview 2 Features 2 Architecture 3 Monitoring 4 ICE Box 4 Events 5 Plug-ins 6 Image Manager 7 Benchmarks 8 ClusterWorX Lite 8 Cluster Management Solution
NetFlow probe on NetFPGA
Verze #1.00, 2008-12-12 NetFlow probe on NetFPGA Introduction With ever-growing volume of data being transferred over the Internet, the need for reliable monitoring becomes more urgent. Monitoring devices
Where is the memory going? Memory usage in the 2.6 kernel
Where is the memory going? Memory usage in the 2.6 kernel Sep 2006 Andi Kleen, SUSE Labs [email protected] Why save memory Weaker reasons "I ve got 1GB of memory. Why should I care about memory?" Old machines
THE VELOX STACK Patrick Marlier (UniNE)
THE VELOX STACK Patrick Marlier (UniNE) 06.09.20101 THE VELOX STACK / OUTLINE 2 APPLICATIONS Real applications QuakeTM Game server C code with OpenMP Globulation 2 Real-Time Strategy Game C++ code using
Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009
Performance Study Performance Evaluation of VMXNET3 Virtual Network Device VMware vsphere 4 build 164009 Introduction With more and more mission critical networking intensive workloads being virtualized
Basics of VTune Performance Analyzer. Intel Software College. Objectives. VTune Performance Analyzer. Agenda
Objectives At the completion of this module, you will be able to: Understand the intended purpose and usage models supported by the VTune Performance Analyzer. Identify hotspots by drilling down through
High Availability Solutions for the MariaDB and MySQL Database
High Availability Solutions for the MariaDB and MySQL Database 1 Introduction This paper introduces recommendations and some of the solutions used to create an availability or high availability environment
Exploiting nginx chunked overflow bug, the undisclosed attack vector
Exploiting nginx chunked overflow bug, the undisclosed attack vector Long Le [email protected] About VNSECURITY.NET CLGT CTF team 2 VNSECURITY.NET In this talk Nginx brief introduction Nginx chunked
Achieving Low-Latency Security
Achieving Low-Latency Security In Today's Competitive, Regulatory and High-Speed Transaction Environment Darren Turnbull, VP Strategic Solutions - Fortinet Agenda 1 2 3 Firewall Architecture Typical Requirements
Virtuoso and Database Scalability
Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of
CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study
CS 377: Operating Systems Lecture 25 - Linux Case Study Guest Lecturer: Tim Wood Outline Linux History Design Principles System Overview Process Scheduling Memory Management File Systems A review of what
Performance Application Programming Interface
/************************************************************************************ ** Notes on Performance Application Programming Interface ** ** Intended audience: Those who would like to learn more
Host Configuration (Linux)
: Location Date Host Configuration (Linux) Trainer Name Laboratory Exercise: Host Configuration (Linux) Objectives In this laboratory exercise you will complete the following tasks: Check for IPv6 support
Scalable Prefix Matching for Internet Packet Forwarding
Scalable Prefix Matching for Internet Packet Forwarding Marcel Waldvogel Computer Engineering and Networks Laboratory Institut für Technische Informatik und Kommunikationsnetze Background Internet growth
SIDN Server Measurements
SIDN Server Measurements Yuri Schaeffer 1, NLnet Labs NLnet Labs document 2010-003 July 19, 2010 1 Introduction For future capacity planning SIDN would like to have an insight on the required resources
Analysis of Open Source Drivers for IEEE 802.11 WLANs
Preprint of an article that appeared in IEEE conference proceeding of ICWCSC 2010 Analysis of Open Source Drivers for IEEE 802.11 WLANs Vipin M AU-KBC Research Centre MIT campus of Anna University Chennai,
OPTIMIZE DMA CONFIGURATION IN ENCRYPTION USE CASE. Guillène Ribière, CEO, System Architect
OPTIMIZE DMA CONFIGURATION IN ENCRYPTION USE CASE Guillène Ribière, CEO, System Architect Problem Statement Low Performances on Hardware Accelerated Encryption: Max Measured 10MBps Expectations: 90 MBps
Full and Para Virtualization
Full and Para Virtualization Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF x86 Hardware Virtualization The x86 architecture offers four levels
Improving DNS performance using Stateless TCP in FreeBSD 9
Improving DNS performance using Stateless TCP in FreeBSD 9 David Hayes, Mattia Rossi, Grenville Armitage Centre for Advanced Internet Architectures, Technical Report 101022A Swinburne University of Technology
Solution: start more than one instruction in the same clock cycle CPI < 1 (or IPC > 1, Instructions per Cycle) Two approaches:
Multiple-Issue Processors Pipelining can achieve CPI close to 1 Mechanisms for handling hazards Static or dynamic scheduling Static or dynamic branch handling Increase in transistor counts (Moore s Law):
Technical Bulletin. Enabling Arista Advanced Monitoring. Overview
Technical Bulletin Enabling Arista Advanced Monitoring Overview Highlights: Independent observation networks are costly and can t keep pace with the production network speed increase EOS eapi allows programmatic
TitanMist: Your First Step to Reversing Nirvana TitanMist. mist.reversinglabs.com
TitanMist: Your First Step to Reversing Nirvana TitanMist mist.reversinglabs.com Contents Introduction to TitanEngine.. 3 Introduction to TitanMist 4 Creating an unpacker for TitanMist.. 5 References and
Accelerate In-Line Packet Processing Using Fast Queue
Accelerate In-Line Packet Processing Using Fast Queue Chun-Ying Huang 1, Chi-Ming Chen 1, Shu-Ping Yu 1, Sheng-Yao Hsu 1, and Chih-Hung Lin 1 Department of Computer Science and Engineering, National Taiwan
D1.2 Network Load Balancing
D1. Network Load Balancing Ronald van der Pol, Freek Dijkstra, Igor Idziejczak, and Mark Meijerink SARA Computing and Networking Services, Science Park 11, 9 XG Amsterdam, The Netherlands June [email protected],[email protected],
Implementing Internet Storage Service Using OpenAFS. Sungjin Chun([email protected]) Dongguen Choi([email protected]) Arum Yoon(toy7777@embian.
Implementing Internet Storage Service Using OpenAFS Sungjin Chun([email protected]) Dongguen Choi([email protected]) Arum Yoon([email protected]) Overview Introduction Implementation Current Status
ELEC 377. Operating Systems. Week 1 Class 3
Operating Systems Week 1 Class 3 Last Class! Computer System Structure, Controllers! Interrupts & Traps! I/O structure and device queues.! Storage Structure & Caching! Hardware Protection! Dual Mode Operation
Architecture of the Kernel-based Virtual Machine (KVM)
Corporate Technology Architecture of the Kernel-based Virtual Machine (KVM) Jan Kiszka, Siemens AG, CT T DE IT 1 Corporate Competence Center Embedded Linux [email protected] Copyright Siemens AG 2010.
OpenFlow with Intel 82599. Voravit Tanyingyong, Markus Hidell, Peter Sjödin
OpenFlow with Intel 82599 Voravit Tanyingyong, Markus Hidell, Peter Sjödin Outline Background Goal Design Experiment and Evaluation Conclusion OpenFlow SW HW Open up commercial network hardware for experiment
SDN/OpenFlow. Dean Pemberton Andy Linton
SDN/OpenFlow Dean Pemberton Andy Linton Agenda What is SDN and Openflow? Understanding Open vswitch and RouteFlow Understanding RYU and SDN applications Simple SDN programming python vs IOS or Junos! Building
