How To Trace

Similar documents

Interpreters and virtual machines. Interpreters. Interpreters. Why interpreters? Tree-based interpreters. Text-based interpreters

CS510 Software Engineering

I Control Your Code Attack Vectors Through the Eyes of Software-based Fault Isolation. Mathias Payer, ETH Zurich

Advanced compiler construction. General course information. Teacher & assistant. Course goals. Evaluation. Grading scheme. Michel Schinz

ProTrack: A Simple Provenance-tracking Filesystem

Obfuscation: know your enemy

Fast Arithmetic Coding (FastAC) Implementations

Compression techniques

On Demand Loading of Code in MMUless Embedded System

The Model Checker SPIN

Towards a Framework for Generating Tests to Satisfy Complex Code Coverage in Java Pathfinder

Fine-Grained User-Space Security Through Virtualization. Mathias Payer and Thomas R. Gross ETH Zurich

Efficient Program Exploration by Input Fuzzing

Instruction Set Architecture (ISA)

language 1 (source) compiler language 2 (target) Figure 1: Compiling a program

Analysis of Compression Algorithms for Program Data

Image Compression through DCT and Huffman Coding Technique

CHAPTER 5 FINITE STATE MACHINE FOR LOOKUP ENGINE

Administration. Instruction scheduling. Modern processors. Examples. Simplified architecture model. CS 412 Introduction to Compilers

Compiler-Assisted Binary Parsing

Physical Data Organization

Static Taint-Analysis on Binary Executables

Scoping (Readings 7.1,7.4,7.6) Parameter passing methods (7.5) Building symbol tables (7.6)

Language Processing Systems

Optimizations. Optimization Safety. Optimization Safety. Control Flow Graphs. Code transformations to improve program

Information, Entropy, and Coding

The programming language C. sws1 1

AUTOMATED TEST GENERATION FOR SOFTWARE COMPONENTS

Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation

Static Code Analysis Procedures in the Development Cycle

CS 2112 Spring Instructions. Assignment 3 Data Structures and Web Filtering. 0.1 Grading. 0.2 Partners. 0.3 Restrictions

Visualizing Information Flow through C Programs

Lempel-Ziv Coding Adaptive Dictionary Compression Algorithm

CHAPTER 17: File Management

Wiggins/Redstone: An On-line Program Specializer

Lecture 2 Introduction to Data Flow Analysis

Instruction Set Design

Storage Optimization in Cloud Environment using Compression Algorithm

Lecture 10: Dynamic Memory Allocation 1: Into the jaws of malloc()

Today s topics. Digital Computers. More on binary. Binary Digits (Bits)

InvGen: An Efficient Invariant Generator

Sources: On the Web: Slides will be available on:

Syntaktická analýza. Ján Šturc. Zima 208

Using Eclipse CDT/PTP for Static Analysis

Cloud Computing. Up until now

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction

IA-64 Application Developer s Architecture Guide

Cloud9 Parallel Symbolic Execution for Automated Real-World Software Testing

Implementation Aspects of OO-Languages

Replication on Virtual Machines

Semantic Analysis: Types and Type Checking

Jonathan Worthington Scarborough Linux User Group

Module: Software Instruction Scheduling Part I

2) Write in detail the issues in the design of code generator.

Static Analysis for Software Verification. Leon Moonen

Algorithm & Flowchart & Pseudo code. Staff Incharge: S.Sasirekha

CIS570 Modern Programming Language Implementation. Office hours: TDB 605 Levine

Making Dynamic Memory Allocation Static To Support WCET Analyses

HOTPATH VM. An Effective JIT Compiler for Resource-constrained Devices

Language Evaluation Criteria. Evaluation Criteria: Readability. Evaluation Criteria: Writability. ICOM 4036 Programming Languages

Course MS10975A Introduction to Programming. Length: 5 Days

How SafeVelocity Improves Network Transfer of Files

CA Compiler Construction

KITES TECHNOLOGY COURSE MODULE (C, C++, DS)

Automatic Network Protocol Analysis

Static Program Transformations for Efficient Software Model Checking

Optimizing compilers. CS Modern Compilers: Theory and Practise. Optimization. Compiler structure. Overview of different optimizations

Change Impact Analysis

gprof: a Call Graph Execution Profiler 1

SIMERO Software System Design and Implementation

TEACHING COMPUTER PROGRAMMING WITH PROGRAM ANIMATION

WAR: Write After Read

Python Programming: An Introduction to Computer Science

Candidates should attempt FOUR questions. All questions carry 25 marks.

SOFTWARE TESTING TRAINING COURSES CONTENTS

Impala: A Modern, Open-Source SQL Engine for Hadoop. Marcel Kornacker Cloudera, Inc.

Inter-domain Routing Basics. Border Gateway Protocol. Inter-domain Routing Basics. Inter-domain Routing Basics. Exterior routing protocols created to:

Briki: a Flexible Java Compiler

CSCI 3136 Principles of Programming Languages

Technical paper review. Program visualization and explanation for novice C programmers by Matthew Heinsen Egan and Chris McDonald.

Development at the Speed and Scale of Google. Ashish Kumar Engineering Tools

Reducing Transfer Delay Using Java Class File Splitting and Prefetching

Thesis Proposal: Improving the Performance of Synchronization in Concurrent Haskell

CSE 141L Computer Architecture Lab Fall Lecture 2

DB2 for i5/os: Tuning for Performance

THE SECURITY AND PRIVACY ISSUES OF RFID SYSTEM

LLVM for OpenGL and other stuff. Chris Lattner Apple Computer

1/20/2016 INTRODUCTION

EE482: Advanced Computer Organization Lecture #11 Processor Architecture Stanford University Wednesday, 31 May ILP Execution

Static Scheduling. option #1: dynamic scheduling (by the hardware) option #2: static scheduling (by the compiler) ECE 252 / CPS 220 Lecture Notes

Chapter 1. Dr. Chris Irwin Davis Phone: (972) Office: ECSS CS-4337 Organization of Programming Languages

Automating Mimicry Attacks Using Static Binary Analysis

Overview. CISC Developments. RISC Designs. CISC Designs. VAX: Addressing Modes. Digital VAX

AVRO - SERIALIZATION

The Sierra Clustered Database Engine, the technology at the heart of

Introduction to Embedded Systems. Software Update Problem

Transcription:

CS510 Software Engineering Dynamic Program Analysis Asst. Prof. Mathias Payer Department of Computer Science Purdue University TA: Scott A. Carr Slides inspired by Xiangyu Zhang http://nebelwelt.net/teaching/15-cs510-se Spring 2015

Table of Contents Overview 1 Overview 2 DPA Primitives 3 Tracing definition 4 Use-cases for Tracing 5 How to Trace Source to Source Instrumentation Binary Instrumentation FastBT, Generating Fast Binary Translators 6 Reducing Trace Size Basic block-level Tracing Alternatives to Reduce Trace Size Compression Using Value Predictors Mathias Payer (Purdue University) CS510 Software Engineering 2015 2 / 35

Overview Overview Dynamic program analysis tackles software dependability and productivity problems by inspecting software execution. A program execution captures runtime behavior of a program (think class and object). Dynamic analysis follows path through the program: each statement is executed {0, N} times. The analysis is restricted to a single path. All variables are instantiated (solving the aliasing problem of static analysis). Mathias Payer (Purdue University) CS510 Software Engineering 2015 3 / 35

Advantages Overview Relatively low learning curve. Precision. Applicability. Scalability. Mathias Payer (Purdue University) CS510 Software Engineering 2015 4 / 35

Disadvantages? Overview Neither generalizable nor complete. Limited to available test-cases. Possible runtime constraints (Heisenbugs) Mathias Payer (Purdue University) CS510 Software Engineering 2015 5 / 35

DPA Primitives Table of Contents 1 Overview 2 DPA Primitives 3 Tracing definition 4 Use-cases for Tracing 5 How to Trace Source to Source Instrumentation Binary Instrumentation FastBT, Generating Fast Binary Translators 6 Reducing Trace Size Basic block-level Tracing Alternatives to Reduce Trace Size Compression Using Value Predictors Mathias Payer (Purdue University) CS510 Software Engineering 2015 6 / 35

DPA Primitives Dynamic Program Analysis Primitives Tracing Profiling Checkpoint and replay Dynamic slicing Execution indexing Delta debugging Mathias Payer (Purdue University) CS510 Software Engineering 2015 7 / 35

Applications DPA Primitives Taint tracking Dynamic information flow tracking Automated debugging Mathias Payer (Purdue University) CS510 Software Engineering 2015 8 / 35

Tracing definition Table of Contents 1 Overview 2 DPA Primitives 3 Tracing definition 4 Use-cases for Tracing 5 How to Trace Source to Source Instrumentation Binary Instrumentation FastBT, Generating Fast Binary Translators 6 Reducing Trace Size Basic block-level Tracing Alternatives to Reduce Trace Size Compression Using Value Predictors Mathias Payer (Purdue University) CS510 Software Engineering 2015 9 / 35

Tracing definition Tracing definition Tracing Tracing is a lossless process that faithfully records detailed information of a program s execution. Tracing is a basic and simple primitive. Mathias Payer (Purdue University) CS510 Software Engineering 2015 10 / 35

Tracing definition Types of Tracing Control-flow tracing (sequence of executed statements); Dependence tracing (sequence of exercised dependences); Value tracing (sequence of values produced by each instruction); Memory access tracing (sequence of memory accesses during execution). Mathias Payer (Purdue University) CS510 Software Engineering 2015 11 / 35

Use-cases for Tracing Table of Contents 1 Overview 2 DPA Primitives 3 Tracing definition 4 Use-cases for Tracing 5 How to Trace Source to Source Instrumentation Binary Instrumentation FastBT, Generating Fast Binary Translators 6 Reducing Trace Size Basic block-level Tracing Alternatives to Reduce Trace Size Compression Using Value Predictors Mathias Payer (Purdue University) CS510 Software Engineering 2015 12 / 35

Use-cases for Tracing Use-cases for Tracing Debugging: time-travel to understand interactions; Code optimizations: hot program paths, data compression, value speculation, data locality for cache optimization; Security: malware analysis; Testing: code coverage. Mathias Payer (Purdue University) CS510 Software Engineering 2015 13 / 35

Table of Contents How to Trace 1 Overview 2 DPA Primitives 3 Tracing definition 4 Use-cases for Tracing 5 How to Trace Source to Source Instrumentation Binary Instrumentation FastBT, Generating Fast Binary Translators 6 Reducing Trace Size Basic block-level Tracing Alternatives to Reduce Trace Size Compression Using Value Predictors Mathias Payer (Purdue University) CS510 Software Engineering 2015 14 / 35

How to Trace Tracing by printf 1 i n t max = 0 ; 2 f o r ( p = head ; p ; p = p >next ) { 3 p r i n t f ( i n l o o p \n ) ; 4 i f ( p >v a l u e > max) { 5 p r i n t f ( True branch \n ) ; 6 max = p >v a l u e ; 7 } 8 } Mathias Payer (Purdue University) CS510 Software Engineering 2015 15 / 35

How to Trace Source to Source Instrumentation Tracing by Source-Level Instrumentation Parse a source file into an AST. Annotate the AST with instrumentation. Translate the annotated trees into a new source file. Compile the new sources. Execute the program and produce a trace as side-effect. Mathias Payer (Purdue University) CS510 Software Engineering 2015 16 / 35

How to Trace Source to Source Instrumentation Source-Level Instrumentation Example 1 f o r ( i = 1 ; i < 1 0 ; i ++) { 2 a [ i ] = b [ i ] 5 ; 3 } for i 1 10 = [] * a i [] 5 b i Mathias Payer (Purdue University) CS510 Software Engineering 2015 17 / 35

How to Trace Source to Source Instrumentation Source-Level Instrumentation Example (2) 1 f o r ( i = 1 ; i < 1 0 ; i ++) { 2 p r i n t f ( I n l o o p \n ) ; 3 a [ i ] = b [ i ] 5 ; 4 } for i 1 10 ; printf = [] * a i [] 5 b i Mathias Payer (Purdue University) CS510 Software Engineering 2015 18 / 35

How to Trace Source to Source Instrumentation Characteristics of Source-Level Instrumentation Detailed type and variable information available. Detailed control-flow structures available. No support for pre-compiled libraries or binaries. Limited support for multi-lingual programs. Requires full source-code. Mathias Payer (Purdue University) CS510 Software Engineering 2015 19 / 35

How to Trace Binary Instrumentation Tracing by Binary Instrumentation Parse binary into intermediate representation, generate graph data structures like CFG. Instrument IR with tracing nodes. Compile/assemble back to an executable for static binary instrumentation or use a JIT to execute on-the-fly. Mathias Payer (Purdue University) CS510 Software Engineering 2015 20 / 35

How to Trace Binary Instrumentation Characteristics of Binary-Level Instrumentation No source-code needed. Supports libraries and any executable. Possibly high overhead due to instrumentation and translation. Limited scope and high-level data structures available. Mathias Payer (Purdue University) CS510 Software Engineering 2015 21 / 35

FastBT How to Trace FastBT, Generating Fast Binary Translators Enable fast, efficient instrumentation at low overhead. Instead of converting machine code to an IR, translate using pre-generated tables. Define a set of translation actions that add instrumentation when dispatched. Use a code-cache to lower overhead. Challenge: define translation actions for instructions that change control-flow. Mathias Payer (Purdue University) CS510 Software Engineering 2015 22 / 35

FastBT Overview How to Trace FastBT, Generating Fast Binary Translators Translator Translates individual basic blocks Verifies code source / destination Checks branch targets and origins Original code 1 2 3 4 R Mapping table 1 1' 2 2' 3 3'... Indirect control flow transfers use a dynamic check to verify target and origin Code cache 1' 2' 3' RX Reading material: Generating low-overhead dynamic binary translators, Mathias Payer and Thomas R. Gross, SySTOR 10 (see course homepage). Mathias Payer (Purdue University) CS510 Software Engineering 2015 23 / 35

Reducing Trace Size Table of Contents 1 Overview 2 DPA Primitives 3 Tracing definition 4 Use-cases for Tracing 5 How to Trace Source to Source Instrumentation Binary Instrumentation FastBT, Generating Fast Binary Translators 6 Reducing Trace Size Basic block-level Tracing Alternatives to Reduce Trace Size Compression Using Value Predictors Mathias Payer (Purdue University) CS510 Software Engineering 2015 24 / 35

Reducing Trace Size Fine-grained Tracing is Expensive! 1 i n t sum = 0 ; 2 i n t i = 1 ; 3 w h i l e ( i < N) { 4 i ++; 5 sum = sum + i ; 6 } 7 p r i n t f ( Sum : %d\n, sum ) ; Trace (N = 6): 1, 2, 3, 4, 5, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 7. Space complexity: exec length sizeof (void ) Mathias Payer (Purdue University) CS510 Software Engineering 2015 25 / 35

Reducing Trace Size Basic block-level Tracing Basic block-level Tracing 1 i n t sum = 0 ; 2 i n t i = 1 ; 3 w h i l e ( i < N) { 4 i ++; 5 sum = sum + i ; 6 } 7 p r i n t f ( Sum : %d\n, sum ) ; BB Trace: 1-2, 3, 4-5, 3, 4-5, 3, 4-5, 3, 4-5, 3, 4-5, 3, 7 In this example only 13/19 storage needed. Drawback: seeking inside basic block is more complicated. Mathias Payer (Purdue University) CS510 Software Engineering 2015 26 / 35

Reducing Trace Size Alternatives to Reduce Trace Size Other options to reduce trace size? Function-level tracing (i.e., recording functions and their parameters)(what about side-effects?) Predicate tracing (i.e., record all branch predicates from beginning of execution (needs only one bit per branch)(seeking is hard) Path-based tracing (record path through CFG)(Needs heavy-weight data structures) Compression using, e.g., deflate(relies on decompression, no seeking) Mathias Payer (Purdue University) CS510 Software Engineering 2015 27 / 35

Reducing Trace Size Compression Using Value Predictors Last n Values Predictor: Compression Buffer stores the last n unique encountered values. If the next value is one of the n values then the index into the buffer is emitted (prefixed with symbol 0). Otherwise (mis-prediction) store the encountered value to the encoded trace (prefixed with symbol m), update the buffer with a least used strategy. Example: 123 456 456 456 456 123 123 789 456 Use last-2 predictor: m 123 m 456 00 00 00 01 01 m 789 m 456 Mathias Payer (Purdue University) CS510 Software Engineering 2015 28 / 35

Reducing Trace Size Compression Using Value Predictors Last n Values Predictor: Decompression Take one bit from encoded trace. If m symbol then read next value and update buffer. If 0 symbol read index and print value from table. n-value Predictors are related to Run-Length Encoding (RLE). Mathias Payer (Purdue University) CS510 Software Engineering 2015 29 / 35

Reducing Trace Size Finite Context Method (FCM) Compression Using Value Predictors Construct a lookup-table that predicts a value based on the last n values (2-FCM, 3-FCM). If the next value is correctly predicted using the left context, a 0-bit is emitted to the encoded trace. Otherwise (mis-prediction), an m-symbol and the original value are emitted to the trace. The lookup-table is updated accordingly. Example (3-FCM): 1 2 3 4 5 3 4 5... 3 4 5 6 m 1 m 2 m 3 m 4 m 5 m 3 m 4 m 5 0... 0 0 0 m 6 Mathias Payer (Purdue University) CS510 Software Engineering 2015 30 / 35

Reducing Trace Size FCM Characteristics Compression Using Value Predictors Length (compressed): n/sizeof (void ) + n (1 predict rate). Predictors are better than deflate due to repetitive loop patterns. Drawback: trace is only forward traversable. Mathias Payer (Purdue University) CS510 Software Engineering 2015 31 / 35

Reducing Trace Size Bidirectional Compression Compression Using Value Predictors Use a small sliding window of clear text on the compressed string (just like with FCM) 1 Keep both left-context and right-context lookup table (instead of just left-context lookup table). Moving forward: decompress next value using left-context lookup table (sliding window is now n+1), compress the first value using the right-context lookup table (sliding window is now n again). Moving backward: decompress using right-context, compress using left-context. 1 The left and right side of the window stay compressed Mathias Payer (Purdue University) CS510 Software Engineering 2015 32 / 35

Reducing Trace Size Compression Using Value Predictors Bidirectional Compression: Example... A X Y 0... Left-context: A X Y: Z Compress right-context: X Y Z: A 2 2 If correct prediction, emit 0 to right-context stream, otherwise update table and emit m symbol Mathias Payer (Purdue University) CS510 Software Engineering 2015 33 / 35

Reducing Trace Size Compression Using Value Predictors Bidirectional Predictor Characteristics Almost same compression rate as unidirectional predictors. (Possibly slightly worse due to different prediction rate for forward/backward). Fast compression/decompression (two times slower than unidirectional predictors). Mathias Payer (Purdue University) CS510 Software Engineering 2015 34 / 35

Questions? Reducing Trace Size Compression Using Value Predictors? Mathias Payer (Purdue University) CS510 Software Engineering 2015 35 / 35