and Symbiotic Optimization

Similar documents

IVIZ TECHNO SOLUTIONS PVT. LTD. Puncture. Automatic Program Analysis using Dynamic Binary Instrumentation. Sunil Kumar

Runtime Monitoring, Performance Analysis

Off-by-One exploitation tutorial

Lecture 7: Machine-Level Programming I: Basics Mohamed Zahran (aka Z)

Introduction to Virtual Machines

Return-oriented programming without returns

64-Bit NASM Notes. Invoking 64-Bit NASM

Hacking Techniques & Intrusion Detection. Ali Al-Shemery arabnix [at] gmail

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 20: Stack Frames 7 March 08

Outline. Outline. Why virtualization? Why not virtualize? Today s data center. Cloud computing. Virtual resource pool

Automatic Logging of Operating System Effects to Guide Application-Level Architecture Simulation

Computer Organization and Architecture

CS61: Systems Programing and Machine Organization

Assembly Language: Function Calls" Jennifer Rexford!

Virtualization. Clothing the Wolf in Wool. Wednesday, April 17, 13

General Introduction

Dynamic Binary Analysis and Instrumentation Covering a function using a DSE approach

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

picojava TM : A Hardware Implementation of the Java Virtual Machine

Compilers and Tools for Software Stack Optimisation

Software Vulnerabilities

Virtual Servers. Virtual machines. Virtualization. Design of IBM s VM. Virtual machine systems can give everyone the OS (and hardware) that they want.

Stack Overflows. Mitchell Adair

Abysssec Research. 1) Advisory information. 2) Vulnerable version

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

Virtualization. Types of Interfaces

Dynamic Program Analysis of Microsoft Windows Applications

Intel 8086 architecture

Cloud Computing #6 - Virtualization

Hotpatching and the Rise of Third-Party Patches

Sequential Performance Analysis with Callgrind and KCachegrind

CSC 2405: Computer Systems II

CPU Organization and Assembly Language

Sequential Performance Analysis with Callgrind and KCachegrind

Instruction Set Design

CPU performance monitoring using the Time-Stamp Counter register

X86-64 Architecture Guide

Processor Architectures

Computer Architectures

CS:APP Chapter 4 Computer Architecture Instruction Set Architecture. CS:APP2e

Computer Architecture Lecture 2: Instruction Set Principles (Appendix A) Chih Wei Liu 劉志尉 National Chiao Tung University

Instruction Set Architecture (ISA)

Buffer Overflows. Security 2011

x86 ISA Modifications to support Virtual Machines

Multi-core Programming System Overview

Virtual Machines. Virtual Machines

Instruction Set Architecture

COS 318: Operating Systems

Dongwoo Kim : Hyeon-jeong Lee s Husband

Using the RDTSC Instruction for Performance Monitoring

Libmonitor: A Tool for First-Party Monitoring

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

CS:APP Chapter 4 Computer Architecture. Wrap-Up. William J. Taffe Plymouth State University. using the slides of

How To Write Portable Programs In C

Computer Organization and Components

CS 152 Computer Architecture and Engineering. Lecture 22: Virtual Machines

Chapter 3 Operating-System Structures

The Plan Today... System Calls and API's Basics of OS design Virtual Machines

Compiler Construction

On Demand Loading of Code in MMUless Embedded System

For a 64-bit system. I - Presentation Of The Shellcode

Chapter 7D The Java Virtual Machine

A Tiny Guide to Programming in 32-bit x86 Assembly Language

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

ELEC 377. Operating Systems. Week 1 Class 3

CSC230 Getting Starting in C. Tyler Bletsch

QEMU, a Fast and Portable Dynamic Translator

Lecture 27 C and Assembly

Basics of VTune Performance Analyzer. Intel Software College. Objectives. VTune Performance Analyzer. Agenda

Real-time Debugging using GDB Tracepoints and other Eclipse features

A Unified View of Virtual Machines

COM 444 Cloud Computing

Attacking x86 Windows Binaries by Jump Oriented Programming

VLIW Processors. VLIW Processors

The Beast is Resting in Your Memory On Return-Oriented Programming Attacks and Mitigation Techniques To appear at USENIX Security & BlackHat USA, 2014

(Refer Slide Time: 00:01:16 min)

Virtualization Technologies

Adapting the PowerPC 403 ROM Monitor Software for a 512Kb Flash Device

Full and Para Virtualization

TODAY, FEW PROGRAMMERS USE ASSEMBLY LANGUAGE. Higher-level languages such

Chapter 2 System Structures

An Easier Way for Cross-Platform Data Acquisition Application Development

An Introduction to Assembly Programming with the ARM 32-bit Processor Family

Application-Specific Attacks: Leveraging the ActionScript Virtual Machine

Performance Monitoring of the Software Frameworks for LHC Experiments

How Compilers Work. by Walter Bright. Digital Mars

8-Bit Flash Microcontroller for Smart Cards. AT89SCXXXXA Summary. Features. Description. Complete datasheet available under NDA

Uses for Virtual Machines. Virtual Machines. There are several uses for virtual machines:

Intel Application Software Development Tool Suite 2.2 for Intel Atom processor. In-Depth

Introduction to Virtual Machines

12. Introduction to Virtual Machines

Chapter 4 Processor Architecture

Multi-/Many-core Modeling at Freescale

Transcription:

Process Virtualization and Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009 About Your Instructor Currently Assistant Professor at University of Virginia Faculty Consultant at Intel Previously PostDoc at Intel (2004-2005) PhD from Harvard (2004) Four summer internships (HP & IBM) Worked with Dynamo, Jikes RVM, Other Interests Marathons (Boston, NYC, Disney) Reality TV Shows Family (8 month old at home!) 1 1

About the Course Day 1 What is Process Virtualization? Day 2 Building Process Virtualization Systems Day 3 Using Process Virtualization Systems Day 4 Symbiotic Optimization We ll use Pin as a case study www.pintool.org You ll have homework! 2 What is Process Virtualization? System virtualization allows multiple OSes to share the same hardware Process virtualization runs as a normal application (on top of an OS) and supports a single process App 1 App 2 App 1 OS 1 OS 2 DBT VMM OS HW HW System Virtualization App 2 DBI Process Virtualization 3 2

Classifying Virtualization Dynamic binary optimization (x86 x86--) Complement the static compiler User inputs, phases, DLLs, hardware features Examples: DynamoRIO, Mojo, Strata Dynamic translation (x86 PPC) Convert applications to run on a new architecture Examples: Rosetta, Transmeta CMS, DAISY Dynamic instrumentation (x86 x86++) Inspect/add features to existing applications Examples: Pin, Valgrind 4 A Simple Example of Instrumentation Inserting extra code into a program to collect runtime information counter++; sub $0xff, %edx counter++; cmp %esi, %edx counter++; jle <L1> counter++; mov $0x1, %edi counter++; add $0x10, %eax 5 3

Instruction Count Output $ /bin/ls Makefile imageload.out itrace proccount imageload inscount atrace itrace.out $ pin -t inscount.so -- /bin/ls Makefile imageload.out itrace proccount imageload inscount atrace itrace.out Count 422838 6 A Simple Example of Optimization On Pentium 3, inc is faster than add On Pentium 4, add is faster than inc sub cmp jle mov inc $0xff, %edx %esi, %edx <L1> $0x1, %edi %eax sub cmp jle mov add $0xff, %edx %esi, %edx <L1> $0x1, %edi $0x1, %eax 7 4

Research Applications Computer Architecture Trace Generation Fault Tolerance Studies Emulating New Instructions Program Analysis Code coverage Call-graph generation Memory-leak detection Instruction profiling Multicore Thread analysis Thread profiling Race detection Cache simulations Compilers Compare programs from competing compilers Security Add security checks and features 8 Approaches Source modification: Modify source programs Binary modification: Modify executables directly Advantages for binary modification Language independent Machine-level view Modify legacy/proprietary software 9 5

Static vs Dynamic Approaches Dynamic approaches are more robust No need to recompile or relink Discover code at runtime Handle dynamically-generated code Attach to running processes The Code Discovery Problem on x86 Instr 1 Instr 2 Indirect jump to?? Instr 3 Jump Reg DATA Data interspersed Instr 5 Instr 6 with code Uncond Branch PADDING Instr 8 Pad for alignment 10 Dynamic Modification: Approaches JIT Mode Create a modified copy of the application on-the-fly Original code never executes More flexible, more common approach Probe Mode Modifies the original application instructions Inserts jumps to modified code (trampolines) Lower overhead (less flexible) approach 11 6

JIT-Mode Binary Modification Generate and cache modified copies of instructions EXE Transform Profile Code Cache Execute Modified (cached) instructions are executed in lieu of original instructions 12 JIT-Mode Instrumentation Original code 1 Code cache 1 Exits point back to VMM 2 3 7 5 6 4 2 7 Fetch trace starting block 1 and start instrumentation Pin 13 7

JIT-Mode Instrumentation Original code 1 Code cache 1 2 3 5 6 7 4 2 7 Transfer control into code cache (block 1) Pin 14 JIT-Mode Instrumentation Original code 1 Code cache trace linking 1 3 2 3 5 6 7 4 2 7 Fetch and instrument a new trace 5 6 Pin 15 8

Instrumentation Approaches JIT Mode Create a modified copy of the application on-the-fly Original code never executes More flexible, more common approach Probe Mode Modify the original application instructions Insert jumps to instrumentation code (trampolines) Lower overhead (less flexible) approach 16 A Sample Probe A probe is a jump instruction that overwrites original instruction(s) in the application Copy/translate original bytes so probed functions can be called Original function entry point: 0x400113d4: push %ebp 0x400113d5: mov %esp,%ebp 0x400113d7: push %edi 0x400113d8: push %esi 0x400113d9: push %ebx Entry point overwritten with probe: 0x400113d4: jmp 0x41481064 0x400113d9: push %ebx Copy of entry point w/ original bytes: 0x50000004: push %ebp 0x50000005: mov %esp,%ebp 0x50000007: push %edi 0x50000008: push %esi 0x50000009: jmp 0x400113d9 17 9

Probe Instrumentation Advantages: Low overhead few percent Less intrusive execute original code Disadvantages: More tool writer responsibility Restrictions on where to modify (routine-level) 18 Probe Tool Writer Responsibilities No control flow into the instruction space where probe is placed 6 bytes on IA32, 7 bytes on Intel64, bundle on IA64 Branch into replaced instructions will fail Probes at function entry point only Thread safety for insertion/deletion of probes During image load callback is safe Only loading thread has a handle to the image Replacement function has same behavior as original 19 10

Probe vs. JIT Summary Probes JIT Overhead Few percent 50% or higher Intrusive Low High Granularity Function boundary Instruction Safety & Isolation More responsibility for tool writer High 20 Process Virtualization Systems Readily Available DynamoRIO Valgrind Pin Available By Request Strata Adore Unavailable Transmeta CMS Dynamo 21 11

DynamoRIO 22 Valgrind 23 12

Pin 24 Intel Pin Dynamic Instrumentation: Do not need source code, recompilation, post-linking Programmable Instrumentation: Provides rich APIs to write in C/C++ your own instrumentation tools (called Pintools) Multiplatform: Supports x86, x86-64, Itanium, Xscale Supports Linux, Windows, MacOS Robust: Instruments real-life applications: Database, web browsers, Instruments multithreaded d applications Supports signals Efficient: Applies compiler optimizations on instrumentation code 25 13

Using Pin Launch and instrument an application $ pin t pintool.so - application Instrumentation engine (provided in the kit) Instrumentation tool (write your own, or use one provided in the kit) Attach to and instrument an application $ pin t pintool.so pid 1234 26 Pin Instrumentation APIs Basic APIs are architecture independent: Provide common functionalities like determining: Control-flow changes Memory accesses Architecture-specific APIs e.g., Info about opcodes and operands Call-based APIs: Instrumentation routines Analysis routines 27 14

Instrumentation vs. Analysis Concepts borrowed from the ATOM tool: Instrumentation routines define where instrumentation is inserted e.g., before instruction Occurs first time an instruction is executed Analysis routines define what to do when instrumentation is activated e.g., increment counter Occurs every time an instruction is executed 28 Pintool 1: Instruction Count counter++; sub $0xff, %edx counter++; cmp %esi, %edx counter++; jle <L1> counter++; mov $0x1, %edi counter++; add $0x10, %eax 29 15

Pintool 1: Instruction Count Output $ /bin/ls Makefile imageload.out itrace proccount imageload inscount0 atrace itrace.out $ pin -t inscount0.so -- /bin/ls Makefile imageload.out itrace proccount imageload inscount0 atrace itrace.out Count 422838 30 #include <iostream> #include "pin.h" ManualExamples/inscount0.cpp UINT64 icount = 0; void docount() { icount++; } analysis routine instrumentation routine void Instruction(INS ins, void *v) { INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)docount, IARG_END); } void Fini(INT32 code, void *v) { std::cerr << "Count " << icount << endl; } int main(int argc, char * argv[]) { PIN_Init(argc, argv); INS_AddInstrumentFunction(Instruction, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0; } 31 16

Pintool 2: Instruction Trace Print(ip); sub $0xff, %edx Print(ip); cmp %esi, %edx Print(ip); jle <L1> Print(ip); mov $0x1, %edi Pi Print(ip); add $0x10, %eax Need to pass ip argument to the analysis routine (Printip()) 32 Pintool 2: Instruction Trace Output $ pin -t itrace.so -- /bin/ls Makefile imageload.out itrace proccount imageload inscount0 atrace itrace.out $ head -4 itrace.out 0x40001e90 0x40001e91 0x40001ee4 0x40001ee5 33 17

ManualExamples/itrace.cpp #include <stdio.h> #include "pin.h" argument to analysis routine FILE * trace; void printip(void *ip) { fprintf(trace, "%p\n", ip); } analysis routine ( ) instrumentation i routine void Instruction(INS ins, void *v) { INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)printip, IARG_INST_PTR, IARG_END); } void Fini(INT32 code, void *v) { fclose(trace); } int main(int argc, char * argv[]) { trace = fopen("itrace.out", "w"); PIN_Init(argc, argv); INS_AddInstrumentFunction(Instruction, 0); } PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0; 34 Examples of Arguments to Analysis Routine IARG_INST_PTR Instruction pointer (program counter) value IARG_UINT32 <value> An integer value IARG_REG_VALUE <register name> Value of the register specified IARG_BRANCH_TARGET_ADDR Target address of the branch instrumented IARG_MEMORY_READ_EA Effective address of a memory read And many more (refer to the manual for details) 35 18

Instrumentation Points Instrument points relative to an instruction: Before: IPOINT_BEFORE After: Fall-through edge: IPOINT_AFTER Taken edge: IPOINT_TAKEN_BRANCH count() count() cmp jle mov %esi, %edx <L1> $0x1, %edi count() <L1>: mov $0x8,%edi 36 Instrumentation Granularity Instrumentation can be done at three different granularities: Instruction Basic block A sequence of instructions sub $0xff, %edx terminated at a control-flow cmp %esi, %edx changing instruction jle <L1> Single entry, single exit Trace mov $0x1, %edi A sequence of basic blocks terminated at an unconditional control-flow changing instruction Single entry, multiple exits add jmp $0x10, %eax <L2> 1 Trace, 2 BBs, 6 insts 37 19

Pintool 3: Faster Instruction Count counter += 3 sub $0xff, %edx cmp %esi, %edx jle <L1> counter += 2 mov $0x1, %edi basic blocks (bbl) add $0x10, %eax 38 ManualExamples/inscount1.cpp #include <stdio.h> #include "pin.h UINT64 icount = 0; void docount(int32 c) { icount += c; } analysis routine void Trace(TRACE trace, void *v) { instrumentation routine for (BBL bbl = TRACE_BblHead(trace); BBL_ Valid(bbl); bbl = BBL_ Next(bbl)) { BBL_InsertCall(bbl, IPOINT_BEFORE, (AFUNPTR)docount, IARG_UINT32, BBL_NumIns(bbl), IARG_END); } } void Fini(INT32 code, void *v) { fprintf(stderr, "Count %lld\n", icount); } int main(int argc, char * argv[]) { PIN_Init(argc, I argv); TRACE_AddInstrumentFunction(Trace, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0; } 39 20

What Did We Learn Today? Overview of Process Virtualization Approaches Source vs. Binary Static vs. Dynamic JIT vs. Probes Three Available Systems Three Simple Examples 40 Want More Info? Read Jim Smith s book: Virtual Machines Download one (or more) of them! Pin DynamoRIO Valgrind www.pintool.org org code.google.com/p/dynamorio www.valgrind.org Day 1 What is Process Virtualization? Day 2 Building Process Virtualization ti Systems Day 3 Using Process Virtualization Systems Day 4 Symbiotic Optimization 41 21