Linux/ia64 support for performance monitoring
|
|
|
- Felix Cole
- 10 years ago
- Views:
Transcription
1 Linux/ia64 support for performance monitoring Stéphane Eranian HP Labs Gelato Meeting, May 2004 UIUC, IL 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
2 Outline Goals of the interface The perfmon-2 kernel interface overview Overview of existing tools Examples of measurements May 25,
3 Goals of the interface Provides a generic interface to access PMU Not dedicated to one app, avoid fragmentation Must be portable across PMU models: Almost all PMU-specific knowledge in user level libraries Supports per-process monitoring Self-monitoring, unmodified binaries, attach/detach multi-threaded and multi-process workloads Supports system-wide monitoring Supports counting and sampling No modification to applications or system Builtin, efficient, robust, secure, simple,documented May 25,
4 Current implementations In Linux/ia64 since In all 2.4-based kernels: perfmon-1 First generation interface Included in SLES-8, RHAS-2.1, RHEL-3.0 (but broken) Several limitations : no monitoring across fork() In all 2.6-based kernels: perfmon-2 Second generation interface Not backward compatible with perfmon-1 Lots of new features and more flexibility Easier to port existing applications (Oprofile, VTUNE) May 25,
5 Perfmon-2 new features Perfmon context identified with file descriptor Monitoring across pthread_create/fork (i.e., clone2) Using ptrace() to help track creation/exit Custom sampling buffer formats Easy to port existing monitoring tools Can attach/detach to/from running process Useful for long running daemons Event-set multiplexing Minimize effect of limited number of counters Scalable counter overflow notification mechanism Exploit Linux file system interface May 25,
6 Perfmon-2 interface Uses a system call instead of drivers interface More fexibility (e.g., # arguments) Close ties with the context switch code, exit, fork Kernel compile-time option (CONFIG_PERFMON) Perfmon context to enscapsulate PMU state Each context uniquely identified by file descriptor Argument similar to ioctl() but you can pass vectors Counters are always exported as 64-bit wide int perfmonctl(int fd, int cmd, void *arg, int narg) PFM_CREATE_CONTEXT PFM_READ_PMDS PFM_START PFM_WRITE_PMCS PFM_LOAD_CONTEXT PFM_STOP PFM_WRITE_PMDS PFM_UNLOAD_CONTEXT PFM_RESTART May 25,
7 System wide monitoring Monitoring of all processes Union of cpu-wide sessions: flexibility: different events on each CPU simplicity for implementations Uses process pinning capabilities Ability to exclude the idle task Does not currently coexist with per-process sessions Tool May 25,
8 Support for event-based sampling 64-bit counter overflow notification via message Extracted via read(), supports select/poll, SIGIO Number of sampling periods = number of counters Sampling period can be randomized (reproduceable) Each counter has long and short period to hide recovery cost task can be stopped on overflow notification Kernel level sampling buffer Variable size Remapped to user level to avoid massive copying Sampling buffer format controlled by kernel module Perfmon core does not know how samples are recorded May 25,
9 Custom sampling buffer formats Why? No single format can satisfy all needs Make it easy to port existing monitoring tools How? Each format is implemented as a kernel module Each format is uniquely identified with 128-bit UUID Each format registers a set of callbacks with perfmon core Each format provides a handler callback invoked on counter overflow What can a format use? May use buffer allocation+remapping service May use overflow notification mechanism May 25,
10 Event set multiplexing To overcome limitation on number of counters Each set encapsulates PMU state (PMC/PMD/IBR/DBR) Can define up to 65K sets Possiblity to exclude idle task per set (system-wide only) Supported in per-task and system-wide modes counting and sampling support Cycling and Switching: Cycling is round-robin Per-set time-based: timeout granularity of clock tick Per-set overflow-based: after n overflows of counter X Multiple counters can be triggers in overflow mode Can be used to implement counter cascading May 25,
11 Linux/ia64 monitoring tools gprof: Flat profile, call graph, recompilation, applications only Kernel profiler: Kernel only flat profile, builtin (cmdline profile=) Prospect(HP)/OProfile: PMU-based, system-wide flat profile VTUNE(Intel): PMU-based, system-wide flat profile, Windows-side GUI pfmon/libpfm, qprof, q-syscollect/q-view (HP Labs) PAPI toolkit (U. of Tenessee): PMU-based, counting, sampling, uses libpfm May 25,
12 Monitoring complicated workloads Can follow across fork/vfork, pthread_create Works for counting and sampling Supports regular expression to filter binaries of interest Example: elapsed cycles for a simple compile $ pfmon --us-c -u -k --follow-all -ecpu_cycles,ia64_inst_retired \ -- cc e.c -o e 1,164,772 CPU_CYCLES /usr/lib/gcc-lib/ia64-linux/2.96/cpp0 1,295,480 IA64_INST_RETIRED /usr/lib/gcc-lib/ia64-linux/2.96/cpp0 13,758,346 CPU_CYCLES /usr/lib/gcc-lib/ia64-linux/2.96/cc1 21,863,635 IA64_INST_RETIRED /usr/lib/gcc-lib/ia64-linux/2.96/cc1 5,708,731 CPU_CYCLES as 7,165,599 IA64_INST_RETIRED as 27,046,535 CPU_CYCLES /usr/bin/ld 35,247,760 IA64_INST_RETIRED /usr/bin/ld 1,381,134 CPU_CYCLES /usr/lib/gcc-lib/ia64-linux/2.96/collect2 1,508,977 IA64_INST_RETIRED /usr/lib/gcc-lib/ia64-linux/2.96/collect2 1,913,253 CPU_CYCLES cc 1,976,590 IA64_INST_RETIRED cc May 25,
13 Detailed cycle breakdown Can use current pfmon with wrapper script i2prof.pl written by Per Ekman Using the current development version of pfmon: exploits event sets multiplexing capabilities $ pfmon -m itanium2-stalls -k -u system-wide print-interval mcf inp.in # # %itlb %icache %bra %unstall %BE %score %RSE D-access # exec flush board %d1tlb %d2tlb %cache -loaduse- # res %gr %fr # May 25,
14 Data load cache misses profiles Obtained using the Data EARS Provides instruction AND data views Careful with interpretation: not all misses lead to stalls To be included in the next version of pfmon Example: mcf instruction and data views #count %self %cum %L2 %L3 %RAM instruction addr % 11.11% 3.05% 5.17% 91.77% price_out_impl+0x820<mcf> % 22.01% 26.74% 69.93% 3.33% price_out_impl+0x850<mcf> % 31.45% 74.43% 24.94% 0.63% bea_compute_red_cost+0x50<mcf> % 40.22% 46.69% 33.77% 19.54% bea_compute_red_cost+0xa1<mcf> % 48.90% 42.43% 9.98% 47.58% primal_bea_mpp+0x7b1<mcf> % 57.42% 36.67% 51.87% 11.46% bea_compute_red_cost+0x90<mcf> #count %self %cum %L2 %L3 %RAM data addr % 0.06% 62.16% 32.43% 5.41% 0x ebd % 0.12% 75.00% 18.75% 6.25% 0x d07b % 0.17% 68.97% 24.14% 6.90% 0x e % 0.22% 96.43% 3.57% 0.00% 0x d % 0.27% 88.46% 11.54% 0.00% 0x d8c58 May 25,
15 Kernel level call stack sampling Combines kernel stack unwinder with perfmon: On counter overflow, record the call stack Uses a custom sampling buffer format Example using the development version of pfmon: $ pfmon -el3_misses --long-smpl-periods= smpl-periods-random=0xff:10 -k \ --smpl-module=kcall-stack-ia64 --resolve-addr --system-wide copy_user,file_read_actor,do_generic_mapping_read, generic_file_aio_read,generic_file_aio_read, do_sync_read,vfs_read,sys_read,ia64_ret_from_syscall do_anonymous_page,do_no_page,handle_mm_fault,ia64_do_page_fault,ia64_leave_kernel clear_page,do_anonymous_page,do_no_page,handle_mm_fault,ia64_do_page_fault,ia64_leave_kernel bh_lru_install, find_get_block, getblk,ext3_get_inode_loc,ext3_reserve_inode_write, ext3_mark_inode_dirty,ext3_dirty_inode, mark_inode_dirty,update_atime,link_path_walk,open_namei, filp_open,sys_open,ia64_ret_from_syscall end_bio_bh_io_sync,bio_endio, end_that_request_first,scsi_end_request,scsi_io_completion, sd_rw_intr,scsi_finish_command,scsi_softirq,do_softirq,ia64_handle_irq,ia64_leave_kernel filemap_nopage,do_no_page,handle_mm_fault,ia64_do_page_fault,ia64_leave_kernel scsi_finish_command,scsi_softirq,do_softirq,ia64_handle_irq,ia64_leave_kernel end_page_writeback,end_buffer_async_write,end_bio_bh_io_sync,bio_endio, end_that_request_first, scsi_end_request,scsi_io_completion,sd_rw_intr,scsi_finish_command,scsi_softirq,do_softirq, ia64_handle_irq,ia64_leave_kernel May 25,
16 Data structure profiling(experimental) Statistical profile of data structure to improve layout combines loads+stores retired events,range restrictions Currently user level but can be made to work in kernel static void walk_list(node_t *list) { while(list) { list->total = list->val1 + list->val2; list = list->next; } } field name :size: offs : line : cover : loads : stores : node_t->prev : 8 : 8 : 0 : 19.98% : 0.00% : 0.00% : node_t->next : 8 : 0 : 0 : 19.67% : 32.86% : 0.00% : node_t->val1 : 8 : 16 : 0 : 20.35% : 33.97% : 0.00% : node_t->val2 : 8 : 24 : 0 : 20.00% : 33.17% : 0.00% : node_t->total : 8 : 32 : 0 : 20.01% : 0.00% : % : May 25,
17 Conclusions The Itanium 2 PMU is very powerful Monitoring is key to achieving world-class performance Linux/ia64 has the most advanced monitoring subsystem of all Linux implementations Linux/ia64 already supports a variety of monitoring tools We need to develop better, smarter tools for nonexperts May 25,
18 Linux/ia64 perfmon resources
19 Linux/ia64 perfmon resources Performance tools developed by HPLabs: pfmon/libpfm, q-tools, q-prof Intel VTUNE: PAPI Prospect: OProfile May 25,
20 Linux/ia64 perfmon resources i2prof.pl: IPF PMU architecture: Itanium 2 PMU specification: May 25,
21 Backup slides
22 Opcode matching with pfmon Constrains monitoring to instructions or patterns Based on opcode, e.g., st8.* Based on functional unit, e.g., M,F,I,B Pattern uses a match+mask fields Not all instructions can be uniquely identified Two opcode matching registers on Itanium 1 & 2 Ex.: counting the number of br.cloop instructions: $ pfmon us-c --opc-match8=0x fff1fa \ -e IA64_TAGGED_INST_RETIRED_IBRP0_PMC8 -- foo 4,999,950,164 IA64_TAGGED_INST_RETIRED_IBRP0_PMC8 May 25,
23 Range restrictions Constrains monitoring to range of data or code Implemented via debug registers (not used as breakpoints) Can specify a range inside the kernel Works for both per-process and system-wide Not all events support range restrictions Range must be aligned on size for exact measurements gcc -falign-functions= option can be useful Pfmon supports symbol name for the range Ex.: how many L2 misses while executing init_tab() $ pfmon us-c -el2_misses -- foo 1,245,516 L2_MISSES (misses for the entire execution) $ pfmon us-c irange=init_tab -el2_misses -- foo 14,456 L2_MISSES (misses for init_tab() only) May 25,
24 Sampling branches (BTB) Capture up to the last 4 branches: Each entry contains source/target addr., prediction outcome Possible to filter branches: taken/not taken, mispredicted Can be combined with EAR to build a path to a cache/tlb miss Ex.: sample every 1000 taken branch, record last 4 $ pfmon --smpl-periods-random=5:0xff --btb-tm-tk \ --long-smpl-periods=1000 ebranch_event -- foo entry 231 PID:673 CPU:0 STAMP:0x ac49 IIP:0x d0 last reset : 1004 branch source address: 0x f2 branch target address: 0x c0 branch taken : yes, prediction: success, pipe flush: no f0:[MFB] nop.m 0x f1: nop.f 0x f2: br.cloop.sptk.few c0 May 25,
25 Simple self-monitoring example pfarg_context_t ctx; pfarg_reg_t pmcs[1], pmds[1]; pfarg_load_t load_args; pfmlib_input_param_t inp; pfmlib_output_param_t outp; pfm_find_event( CPU_CYCLES, &inp.pfp_events[0]); inp.pfp_plm = PFM_PLM3; inp.pfp_count = 1; pfm_dispatch_events(&inp, NULL, &outp, NULL); pmds[0].reg_num = pmcs[0].reg_num = outp.pfp_pmcs[0].reg_num; pmcs[0].reg_value = outp_pfp_pmcs[0].reg_value; perfmonctl(0, PFM_CREATE_CONTEXT, &ctx,1); fd = ctx.ctx_fd; perfmonctl(fd, PFM_WRITE_PMCS, pmcs, 1); perfmonctl(fd, PFM_WRITE_PMDS, pmds, 1); load_arg.load_pid = getpid(); perfmonctl(fd, PFM_LOAD_CONTEXT, &load_args, 1); perfmonctl(fd, PFM_START, NULL, 0); /* run code to measure */ perfmonctl(fd, PFM_STOP, NULL, 0); perfmonctl(fd, PFM_READ_PMDS, pmds, 1); close(fd); May 25,
26 Monitoring an unmodified binary Tool (pfmon) PFM_READ_PMDS fork()+exec() Program to monitor(m) fd SIGCHILD user level kernel level file table PFM_LOAD_CONTEXT pfm context May 25,
27 Typical sampling setup Tool (pfmon) fork()+exec() Monitored program Save to disk PFM_RESTART fd user level kernel level Storage device file table PFM_OVFL_MSG PFM_LOAD_CONTEXT pfm context May 25,
28 Event-based sampling with pfmon Can sample on any event Multiple simultaneous sampling periods supported Can collect several profiles at the same time Sampling period can be randomized Very important to avoid biased results Ex.: every 100,000 cycles, record ia64_inst_retired $ pfmon reset-non-smpl long-smpl-periods= ecpu_cycle, ia64_inst_retired -- foo entry 6138 PID:4220 CPU:0 STAMP:0x480af599a15a IIP:0x a0 OVFL: 4 LAST_VAL: PMD5 : 0x ec4 ( instructions retired) entry 6139 PID:4220 CPU:0 STAMP:0x480af59b2c83 IIP:0x a0 OVFL: 4 LAST_VAL: PMD5 : 0x ec4 May 25,
29 Sampling cache and TLB misses (EARS) Very useful to find where cache/tlb misses occur Cannot be done with naïve IP-based sampling Pinpoint the source of a miss, not the consequence Must be used with sampling Ex.: sample every 1000 cache misses with latency > 4 cycles $ pfmon -long-smpl-periods=1000 -edata_ear_cache_lat4 foo entry 2000 PID:608 CPU:0 STAMP:0xfe3e1212e5 IIP:0x accessed data: 0x miss latency : 16 cycles inst address : 0x : [MMI] ld8 r15=[r16] : ld8 r14=[r17] miss source : nop.i 0x0;; : [MMI] cmp.ltu p7,p6=r14,r15;; stall May 25,
30 Custom sampling buffer interface Format private interface (optional) perfmon subsystem register_buffer_format() unregister_buffer_format() Custom sampling format module user level kernel level Fixed sampling format validate() get_size() initialize() handler() restart() restart_active() exit() May 25,
31 Sampling buffer formats scenarios Formats can manage their own private buffers Formats can export their data anyway they want OProfile ported using scenario 2 perfmon core buffer format perfmon core buffer format perfmon core buffer format No buffer, just need overflow callback Private buffer, private buffer access Perfmon allocation & remapping Scenario 1 Scenario 2 Scenario 3 May 25,
32 pfmon/libpfm for perfmon-2 Pfmon-3.0 (GNU GPL): Monitoring of unmodified binaries or system-wide Supports counting and event-based sampling Supports all PMU features of Itanium 1 & 2 Uses libpfm for setup Libpfm-3.0 (MIT-style): Helps setup PMC registers Provides common interface across PMU models Contains all PMU specific knowledge (event encoding,...) Provides model specific interface for advanced features Does not invoke perfmonctl() Use pfmon-2.0/libpfm-2.0 for all 2.4 kernels May 25,
33
perfmon2: a flexible performance monitoring interface for Linux perfmon2: une interface flexible pour l'analyse de performance sous Linux
perfmon2: a flexible performance monitoring interface for Linux perfmon2: une interface flexible pour l'analyse de performance sous Linux Stéphane Eranian HP Labs July 2006 Ottawa Linux Symposium 2006
Perfmon2: a standard performance monitoring interface for Linux. Stéphane Eranian <[email protected]>
Perfmon2: a standard performance monitoring interface for Linux Stéphane Eranian Agenda PMU-based performance monitoring Overview of the interface Current status Tools Challenges 2
Perfmon2: a flexible performance monitoring interface for Linux
Perfmon2: a flexible performance monitoring interface for Linux Stéphane Eranian HP Labs [email protected] Abstract Monitoring program execution is becoming more than ever key to achieving world-class
Perfmon2: A leap forward in Performance Monitoring
Perfmon2: A leap forward in Performance Monitoring Sverre Jarp, Ryszard Jurga, Andrzej Nowak CERN, Geneva, Switzerland [email protected] Abstract. This paper describes the software component, perfmon2,
The Intel VTune Performance Analyzer
The Intel VTune Performance Analyzer Focusing on Vtune for Intel Itanium running Linux* OS Copyright 2002 Intel Corporation. All rights reserved. VTune and the Intel logo are trademarks or registered trademarks
STUDY OF PERFORMANCE COUNTERS AND PROFILING TOOLS TO MONITOR PERFORMANCE OF APPLICATION
STUDY OF PERFORMANCE COUNTERS AND PROFILING TOOLS TO MONITOR PERFORMANCE OF APPLICATION 1 DIPAK PATIL, 2 PRASHANT KHARAT, 3 ANIL KUMAR GUPTA 1,2 Depatment of Information Technology, Walchand College of
Performance Monitoring of the Software Frameworks for LHC Experiments
Proceedings of the First EELA-2 Conference R. mayo et al. (Eds.) CIEMAT 2009 2009 The authors. All rights reserved Performance Monitoring of the Software Frameworks for LHC Experiments William A. Romero
Hardware Performance Monitoring with PAPI
Hardware Performance Monitoring with PAPI Dan Terpstra [email protected] Workshop July 2007 What s s PAPI? Middleware that provides a consistent programming interface for the performance counter hardware
Basics of VTune Performance Analyzer. Intel Software College. Objectives. VTune Performance Analyzer. Agenda
Objectives At the completion of this module, you will be able to: Understand the intended purpose and usage models supported by the VTune Performance Analyzer. Identify hotspots by drilling down through
A Study of Performance Monitoring Unit, perf and perf_events subsystem
A Study of Performance Monitoring Unit, perf and perf_events subsystem Team Aman Singh Anup Buchke Mentor Dr. Yann-Hang Lee Summary Performance Monitoring Unit, or the PMU, is found in all high end processors
A Brief Survery of Linux Performance Engineering. Philip J. Mucci University of Tennessee, Knoxville [email protected]
A Brief Survery of Linux Performance Engineering Philip J. Mucci University of Tennessee, Knoxville [email protected] Overview On chip Hardware Performance Counters Linux Performance Counter Infrastructure
Performance monitoring of the software frameworks for LHC experiments
Performance monitoring of the software frameworks for LHC experiments William A. Romero R. [email protected] J.M. Dana [email protected] First EELA-2 Conference Bogotá, COL OUTLINE Introduction
Libmonitor: A Tool for First-Party Monitoring
Libmonitor: A Tool for First-Party Monitoring Mark W. Krentel Dept. of Computer Science Rice University 6100 Main St., Houston, TX 77005 [email protected] ABSTRACT Libmonitor is a library that provides
Perf Tool: Performance Analysis Tool for Linux
/ Notes on Linux perf tool Intended audience: Those who would like to learn more about Linux perf performance analysis and profiling tool. Used: CPE 631 Advanced Computer Systems and Architectures CPE
Binary Translation for Fun and Profit
Binary Translation for Fun and Profit Andy Goldstein OpenVMS Engineering [email protected] 2005 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without
Using PAPI for hardware performance monitoring on Linux systems
Using PAPI for hardware performance monitoring on Linux systems Jack Dongarra, Kevin London, Shirley Moore, Phil Mucci, and Dan Terpstra Innovative Computing Laboratory, University of Tennessee, Knoxville,
Intel IA-64 Architecture Software Developer s Manual
Intel IA-64 Architecture Software Developer s Manual Volume 4: Itanium Processor Programmer s Guide January 2000 Order Number: 245320-001 THIS DOCUMENT IS PROVIDED AS IS WITH NO WARRANTIES WHATSOEVER,
CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study
CS 377: Operating Systems Lecture 25 - Linux Case Study Guest Lecturer: Tim Wood Outline Linux History Design Principles System Overview Process Scheduling Memory Management File Systems A review of what
Performance Application Programming Interface
/************************************************************************************ ** Notes on Performance Application Programming Interface ** ** Intended audience: Those who would like to learn more
Application Note 195. ARM11 performance monitor unit. Document number: ARM DAI 195B Issued: 15th February, 2008 Copyright ARM Limited 2007
Application Note 195 ARM11 performance monitor unit Document number: ARM DAI 195B Issued: 15th February, 2008 Copyright ARM Limited 2007 Copyright 2007 ARM Limited. All rights reserved. Application Note
Performance Monitor on PowerQUICC II Pro Processors
Freescale Semiconductor Application Note Document Number: AN3359 Rev. 0, 05/2007 Performance Monitor on PowerQUICC II Pro Processors by Harinder Rai Network Computing Systems Group Freescale Semiconductor,
Code Coverage Testing Using Hardware Performance Monitoring Support
Code Coverage Testing Using Hardware Performance Monitoring Support Alex Shye Matthew Iyer Vijay Janapa Reddi Daniel A. Connors Department of Electrical and Computer Engineering University of Colorado
Performance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab
Performance monitoring at CERN openlab July 20 th 2012 Andrzej Nowak, CERN openlab Data flow Reconstruction Selection and reconstruction Online triggering and filtering in detectors Raw Data (100%) Event
Leak Check Version 2.1 for Linux TM
Leak Check Version 2.1 for Linux TM User s Guide Including Leak Analyzer For x86 Servers Document Number DLC20-L-021-1 Copyright 2003-2009 Dynamic Memory Solutions LLC www.dynamic-memory.com Notices Information
On the Importance of Thread Placement on Multicore Architectures
On the Importance of Thread Placement on Multicore Architectures HPCLatAm 2011 Keynote Cordoba, Argentina August 31, 2011 Tobias Klug Motivation: Many possibilities can lead to non-deterministic runtimes...
Operating System Impact on SMT Architecture
Operating System Impact on SMT Architecture The work published in An Analysis of Operating System Behavior on a Simultaneous Multithreaded Architecture, Josh Redstone et al., in Proceedings of the 9th
Eloquence Training What s new in Eloquence B.08.00
Eloquence Training What s new in Eloquence B.08.00 2010 Marxmeier Software AG Rev:100727 Overview Released December 2008 Supported until November 2013 Supports 32-bit and 64-bit platforms HP-UX Itanium
EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications
ECE6102 Dependable Distribute Systems, Fall2010 EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications Deepal Jayasinghe, Hyojun Kim, Mohammad M. Hossain, Ali Payani
System/Networking performance analytics with perf. Hannes Frederic Sowa <[email protected]>
System/Networking performance analytics with perf Hannes Frederic Sowa Prerequisites Recent Linux Kernel CONFIG_PERF_* CONFIG_DEBUG_INFO Fedora: debuginfo-install kernel for
COS 318: Operating Systems
COS 318: Operating Systems OS Structures and System Calls Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall10/cos318/ Outline Protection mechanisms
<Insert Picture Here> Tracing on Linux: the Old, the New, and the Ugly
Tracing on Linux: the Old, the New, and the Ugly Elena Zannoni ([email protected]) Linux Engineering, Oracle America October 27 2011 Categories of Tracing Tools Kernel Tracing
Performance Counter. Non-Uniform Memory Access Seminar Karsten Tausche 2014-12-10
Performance Counter Non-Uniform Memory Access Seminar Karsten Tausche 2014-12-10 Performance Counter Hardware Unit for event measurements Performance Monitoring Unit (PMU) Originally for CPU-Debugging
Red Hat Linux Internals
Red Hat Linux Internals Learn how the Linux kernel functions and start developing modules. Red Hat Linux internals teaches you all the fundamental requirements necessary to understand and start developing
Basic Performance Measurements for AMD Athlon 64, AMD Opteron and AMD Phenom Processors
Basic Performance Measurements for AMD Athlon 64, AMD Opteron and AMD Phenom Processors Paul J. Drongowski AMD CodeAnalyst Performance Analyzer Development Team Advanced Micro Devices, Inc. Boston Design
Intel Application Software Development Tool Suite 2.2 for Intel Atom processor. In-Depth
Application Software Development Tool Suite 2.2 for Atom processor In-Depth Contents Application Software Development Tool Suite 2.2 for Atom processor............................... 3 Features and Benefits...................................
Lecture 5. User-Mode Linux. Jeff Dike. November 7, 2012. Operating Systems Practical. OSP Lecture 5, UML 1/33
Lecture 5 User-Mode Linux Jeff Dike Operating Systems Practical November 7, 2012 OSP Lecture 5, UML 1/33 Contents User-Mode Linux Keywords Resources Questions OSP Lecture 5, UML 2/33 Outline User-Mode
Multi-core architectures. Jernej Barbic 15-213, Spring 2007 May 3, 2007
Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007 1 Single-core computer 2 Single-core CPU chip the single core 3 Multi-core architectures This lecture is about a new trend in computer
Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture
Last Class: OS and Computer Architecture System bus Network card CPU, memory, I/O devices, network card, system bus Lecture 3, page 1 Last Class: OS and Computer Architecture OS Service Protection Interrupts
Hardware performance monitoring. Zoltán Majó
Hardware performance monitoring Zoltán Majó 1 Question Did you take any of these lectures: Computer Architecture and System Programming How to Write Fast Numerical Code Design of Parallel and High Performance
VMware and CPU Virtualization Technology. Jack Lo Sr. Director, R&D
ware and CPU Virtualization Technology Jack Lo Sr. Director, R&D This presentation may contain ware confidential information. Copyright 2005 ware, Inc. All rights reserved. All other marks and names mentioned
Android Development: a System Perspective. Javier Orensanz
Android Development: a System Perspective Javier Orensanz 1 ARM - Linux and Communities Linux kernel GNU Tools 2 Linaro Partner Initiative Mission: Make open source development easier by delivering a common
A JIT Compiler for Android s Dalvik VM. Ben Cheng, Bill Buzbee May 2010
A JIT Compiler for Android s Dalvik VM Ben Cheng, Bill Buzbee May 2010 Overview View live session notes and ask questions on Google Wave: http://bit.ly/bizjnf Dalvik Environment Trace vs. Method Granularity
I Control Your Code Attack Vectors Through the Eyes of Software-based Fault Isolation. Mathias Payer, ETH Zurich
I Control Your Code Attack Vectors Through the Eyes of Software-based Fault Isolation Mathias Payer, ETH Zurich Motivation Applications often vulnerable to security exploits Solution: restrict application
CS161: Operating Systems
CS161: Operating Systems Matt Welsh [email protected] Lecture 2: OS Structure and System Calls February 6, 2007 1 Lecture Overview Protection Boundaries and Privilege Levels What makes the kernel different
Frysk The Systems Monitoring and Debugging Tool. Andrew Cagney
Frysk The Systems Monitoring and Debugging Tool Andrew Cagney Agenda Two Use Cases Motivation Comparison with Existing Free Technologies The Frysk Architecture and GUI Command Line Utilities Current Status
Chapter 5 Cloud Resource Virtualization
Chapter 5 Cloud Resource Virtualization Contents Virtualization. Layering and virtualization. Virtual machine monitor. Virtual machine. Performance and security isolation. Architectural support for virtualization.
Freescale Semiconductor, I
nc. Application Note 6/2002 8-Bit Software Development Kit By Jiri Ryba Introduction 8-Bit SDK Overview This application note describes the features and advantages of the 8-bit SDK (software development
Getting Started with CodeXL
AMD Developer Tools Team Advanced Micro Devices, Inc. Table of Contents Introduction... 2 Install CodeXL... 2 Validate CodeXL installation... 3 CodeXL help... 5 Run the Teapot Sample project... 5 Basic
<Insert Picture Here> Tracing on Linux
Tracing on Linux Elena Zannoni ([email protected]) Linux Engineering, Oracle America November 6 2012 The Tree of Tracing SystemTap LTTng perf DTrace ftrace GDB TRACE_EVENT
Transparent Monitoring of a Process Self in a Virtual Environment
Transparent Monitoring of a Process Self in a Virtual Environment PhD Lunchtime Seminar Università di Pisa 24 Giugno 2008 Outline Background Process Self Attacks Against the Self Dynamic and Static Analysis
ELEC 377. Operating Systems. Week 1 Class 3
Operating Systems Week 1 Class 3 Last Class! Computer System Structure, Controllers! Interrupts & Traps! I/O structure and device queues.! Storage Structure & Caching! Hardware Protection! Dual Mode Operation
Virtual Memory. How is it possible for each process to have contiguous addresses and so many of them? A System Using Virtual Addressing
How is it possible for each process to have contiguous addresses and so many of them? Computer Systems Organization (Spring ) CSCI-UA, Section Instructor: Joanna Klukowska Teaching Assistants: Paige Connelly
VLIW Processors. VLIW Processors
1 VLIW Processors VLIW ( very long instruction word ) processors instructions are scheduled by the compiler a fixed number of operations are formatted as one big instruction (called a bundle) usually LIW
Full and Para Virtualization
Full and Para Virtualization Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF x86 Hardware Virtualization The x86 architecture offers four levels
An Implementation Of Multiprocessor Linux
An Implementation Of Multiprocessor Linux This document describes the implementation of a simple SMP Linux kernel extension and how to use this to develop SMP Linux kernels for architectures other than
Xen and the Art of. Virtualization. Ian Pratt
Xen and the Art of Virtualization Ian Pratt Keir Fraser, Steve Hand, Christian Limpach, Dan Magenheimer (HP), Mike Wray (HP), R Neugebauer (Intel), M Williamson (Intel) Computer Laboratory Outline Virtualization
Why Threads Are A Bad Idea (for most purposes)
Why Threads Are A Bad Idea (for most purposes) John Ousterhout Sun Microsystems Laboratories [email protected] http://www.sunlabs.com/~ouster Introduction Threads: Grew up in OS world (processes).
Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Real-Time Systems Prof. Dr. Rajib Mall Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No. # 26 Real - Time POSIX. (Contd.) Ok Good morning, so let us get
CSC 2405: Computer Systems II
CSC 2405: Computer Systems II Spring 2013 (TR 8:30-9:45 in G86) Mirela Damian http://www.csc.villanova.edu/~mdamian/csc2405/ Introductions Mirela Damian Room 167A in the Mendel Science Building [email protected]
Virtualization. Dr. Yingwu Zhu
Virtualization Dr. Yingwu Zhu What is virtualization? Virtualization allows one computer to do the job of multiple computers. Virtual environments let one computer host multiple operating systems at the
VxWorks Guest OS Programmer's Guide for Hypervisor 1.1, 6.8. VxWorks GUEST OS PROGRAMMER'S GUIDE FOR HYPERVISOR 1.1 6.8
VxWorks Guest OS Programmer's Guide for Hypervisor 1.1, 6.8 VxWorks GUEST OS PROGRAMMER'S GUIDE FOR HYPERVISOR 1.1 6.8 Copyright 2009 Wind River Systems, Inc. All rights reserved. No part of this publication
COS 318: Operating Systems. Virtual Machine Monitors
COS 318: Operating Systems Virtual Machine Monitors Kai Li and Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall13/cos318/ Introduction u Have
PERFORMANCE TOOLS DEVELOPMENTS
PERFORMANCE TOOLS DEVELOPMENTS Roberto A. Vitillo presented by Paolo Calafiura & Wim Lavrijsen Lawrence Berkeley National Laboratory Future computing in particle physics, 16 June 2011 1 LINUX PERFORMANCE
How to Sandbox IIS Automatically without 0 False Positive and Negative
How to Sandbox IIS Automatically without 0 False Positive and Negative Professor Tzi-cker Chiueh Computer Science Department Stony Brook University [email protected] 2/8/06 Blackhat Federal 2006 1 Big
Harmonizing policy management with Murphy in GENIVI, AGL and TIZEN IVI
Harmonizing policy management with Murphy in GENIVI, AGL and TIZEN IVI 1 Long term TIZEN Objectives for harmonization Support in TIZEN for coexistence of GENIVI applications Allow portable business rules
-------- Overview --------
------------------------------------------------------------------- Intel(R) Trace Analyzer and Collector 9.1 Update 1 for Windows* OS Release Notes -------------------------------------------------------------------
FlexSC: Flexible System Call Scheduling with Exception-Less System Calls
FlexSC: Flexible System Call Scheduling with Exception-Less System Calls Livio Soares University of Toronto Michael Stumm University of Toronto Abstract For the past 3+ years, system calls have been the
EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors
EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors Junfeng Yang Joint work with Can Sar, Paul Twohey, Ben Pfaff, Dawson Engler and Madan Musuvathi Mom, Google Ate My GMail!
EPICS VME Crate Monitor Software Design Note 7.2.39.19 Rev. 0 Date: 2003-01-09
EPICS VME Crate Monitor Software Design Note 7.2.39.19 Rev. 0 Date: 2003-01-09 Copyright 2003, Canadian Light Source Inc. This document is the property of Canadian Light Source Inc. (CLS). No exploitation
FlexSC: Flexible System Call Scheduling with Exception-Less System Calls
FlexSC: Flexible System Call Scheduling with Exception-Less System Calls Livio Soares University of Toronto Michael Stumm University of Toronto Abstract For the past 3+ years, system calls have been the
EE8205: Embedded Computer System Electrical and Computer Engineering, Ryerson University. Multitasking ARM-Applications with uvision and RTX
EE8205: Embedded Computer System Electrical and Computer Engineering, Ryerson University Multitasking ARM-Applications with uvision and RTX 1. Objectives The purpose of this lab is to lab is to introduce
An Overview of Software Performance Analysis Tools and Techniques: From GProf to DTrace
http://www.cse.wustl.edu/~jain/cse567-06/ftp/sw_monitors1/index.html 1 of 19 An Overview of Software Performance Analysis Tools and Techniques: From GProf to DTrace Justin Thiel, [email protected]
Multi-Threading Performance on Commodity Multi-Core Processors
Multi-Threading Performance on Commodity Multi-Core Processors Jie Chen and William Watson III Scientific Computing Group Jefferson Lab 12000 Jefferson Ave. Newport News, VA 23606 Organization Introduction
FRONT FLYLEAF PAGE. This page has been intentionally left blank
FRONT FLYLEAF PAGE This page has been intentionally left blank Abstract The research performed under this publication will combine virtualization technology with current kernel debugging techniques to
The HP Neoview data warehousing platform for business intelligence
The HP Neoview data warehousing platform for business intelligence Ronald Wulff EMEA, BI Solution Architect HP Software - Neoview 2006 Hewlett-Packard Development Company, L.P. The inf ormation contained
6.828 Operating System Engineering: Fall 2003. Quiz II Solutions THIS IS AN OPEN BOOK, OPEN NOTES QUIZ.
Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.828 Operating System Engineering: Fall 2003 Quiz II Solutions All problems are open-ended questions. In
Support Notes for SUSE LINUX Enterprise Server 9 Service Pack 3 for the Intel Itanium 2 Processor Family
Support Notes for SUSE LINUX Enterprise Server 9 Service Pack 3 for the Intel Itanium 2 Processor Family *5991-5301* Part number: 5991-5301 Edition: 3, E0406 Copyright 2006 Hewlett-Packard Development
Hardware-based performance monitoring with VTune Performance Analyzer under Linux
Hardware-based performance monitoring with VTune Performance Analyzer under Linux Hassan Shojania [email protected] Abstract All new modern processors have hardware support for monitoring processor performance.
CSE 141L Computer Architecture Lab Fall 2003. Lecture 2
CSE 141L Computer Architecture Lab Fall 2003 Lecture 2 Pramod V. Argade CSE141L: Computer Architecture Lab Instructor: TA: Readers: Pramod V. Argade ([email protected]) Office Hour: Tue./Thu. 9:30-10:30
Fine-Grained User-Space Security Through Virtualization. Mathias Payer and Thomas R. Gross ETH Zurich
Fine-Grained User-Space Security Through Virtualization Mathias Payer and Thomas R. Gross ETH Zurich Motivation Applications often vulnerable to security exploits Solution: restrict application access
CS352H: Computer Systems Architecture
CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline - Hazards October 1, 2009 University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell Data Hazards in ALU Instructions
Compiler-Assisted Binary Parsing
Compiler-Assisted Binary Parsing Tugrul Ince [email protected] PD Week 2012 26 27 March 2012 Parsing Binary Files Binary analysis is common for o Performance modeling o Computer security o Maintenance
System Structures. Services Interface Structure
System Structures Services Interface Structure Operating system services (1) Operating system services (2) Functions that are helpful to the user User interface Command line interpreter Batch interface
Architecture of the Kernel-based Virtual Machine (KVM)
Corporate Technology Architecture of the Kernel-based Virtual Machine (KVM) Jan Kiszka, Siemens AG, CT T DE IT 1 Corporate Competence Center Embedded Linux [email protected] Copyright Siemens AG 2010.
Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6
Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6 Winter Term 2008 / 2009 Jun.-Prof. Dr. André Brinkmann [email protected] Universität Paderborn PC² Agenda Multiprocessor and
Monitoring, Tracing, Debugging (Under Construction)
Monitoring, Tracing, Debugging (Under Construction) I was already tempted to drop this topic from my lecture on operating systems when I found Stephan Siemen's article "Top Speed" in Linux World 10/2003.
