CS510 Software Engineering Dynamic Program Analysis Asst. Prof. Mathias Payer Department of Computer Science Purdue University TA: Scott A. Carr Slides inspired by Xiangyu Zhang http://nebelwelt.net/teaching/15-cs510-se Spring 2015
Table of Contents Overview 1 Overview 2 DPA Primitives 3 Tracing definition 4 Use-cases for Tracing 5 How to Trace Source to Source Instrumentation Binary Instrumentation FastBT, Generating Fast Binary Translators 6 Reducing Trace Size Basic block-level Tracing Alternatives to Reduce Trace Size Compression Using Value Predictors Mathias Payer (Purdue University) CS510 Software Engineering 2015 2 / 35
Overview Overview Dynamic program analysis tackles software dependability and productivity problems by inspecting software execution. A program execution captures runtime behavior of a program (think class and object). Dynamic analysis follows path through the program: each statement is executed {0, N} times. The analysis is restricted to a single path. All variables are instantiated (solving the aliasing problem of static analysis). Mathias Payer (Purdue University) CS510 Software Engineering 2015 3 / 35
Advantages Overview Relatively low learning curve. Precision. Applicability. Scalability. Mathias Payer (Purdue University) CS510 Software Engineering 2015 4 / 35
Disadvantages? Overview Neither generalizable nor complete. Limited to available test-cases. Possible runtime constraints (Heisenbugs) Mathias Payer (Purdue University) CS510 Software Engineering 2015 5 / 35
DPA Primitives Table of Contents 1 Overview 2 DPA Primitives 3 Tracing definition 4 Use-cases for Tracing 5 How to Trace Source to Source Instrumentation Binary Instrumentation FastBT, Generating Fast Binary Translators 6 Reducing Trace Size Basic block-level Tracing Alternatives to Reduce Trace Size Compression Using Value Predictors Mathias Payer (Purdue University) CS510 Software Engineering 2015 6 / 35
DPA Primitives Dynamic Program Analysis Primitives Tracing Profiling Checkpoint and replay Dynamic slicing Execution indexing Delta debugging Mathias Payer (Purdue University) CS510 Software Engineering 2015 7 / 35
Applications DPA Primitives Taint tracking Dynamic information flow tracking Automated debugging Mathias Payer (Purdue University) CS510 Software Engineering 2015 8 / 35
Tracing definition Table of Contents 1 Overview 2 DPA Primitives 3 Tracing definition 4 Use-cases for Tracing 5 How to Trace Source to Source Instrumentation Binary Instrumentation FastBT, Generating Fast Binary Translators 6 Reducing Trace Size Basic block-level Tracing Alternatives to Reduce Trace Size Compression Using Value Predictors Mathias Payer (Purdue University) CS510 Software Engineering 2015 9 / 35
Tracing definition Tracing definition Tracing Tracing is a lossless process that faithfully records detailed information of a program s execution. Tracing is a basic and simple primitive. Mathias Payer (Purdue University) CS510 Software Engineering 2015 10 / 35
Tracing definition Types of Tracing Control-flow tracing (sequence of executed statements); Dependence tracing (sequence of exercised dependences); Value tracing (sequence of values produced by each instruction); Memory access tracing (sequence of memory accesses during execution). Mathias Payer (Purdue University) CS510 Software Engineering 2015 11 / 35
Use-cases for Tracing Table of Contents 1 Overview 2 DPA Primitives 3 Tracing definition 4 Use-cases for Tracing 5 How to Trace Source to Source Instrumentation Binary Instrumentation FastBT, Generating Fast Binary Translators 6 Reducing Trace Size Basic block-level Tracing Alternatives to Reduce Trace Size Compression Using Value Predictors Mathias Payer (Purdue University) CS510 Software Engineering 2015 12 / 35
Use-cases for Tracing Use-cases for Tracing Debugging: time-travel to understand interactions; Code optimizations: hot program paths, data compression, value speculation, data locality for cache optimization; Security: malware analysis; Testing: code coverage. Mathias Payer (Purdue University) CS510 Software Engineering 2015 13 / 35
Table of Contents How to Trace 1 Overview 2 DPA Primitives 3 Tracing definition 4 Use-cases for Tracing 5 How to Trace Source to Source Instrumentation Binary Instrumentation FastBT, Generating Fast Binary Translators 6 Reducing Trace Size Basic block-level Tracing Alternatives to Reduce Trace Size Compression Using Value Predictors Mathias Payer (Purdue University) CS510 Software Engineering 2015 14 / 35
How to Trace Tracing by printf 1 i n t max = 0 ; 2 f o r ( p = head ; p ; p = p >next ) { 3 p r i n t f ( i n l o o p \n ) ; 4 i f ( p >v a l u e > max) { 5 p r i n t f ( True branch \n ) ; 6 max = p >v a l u e ; 7 } 8 } Mathias Payer (Purdue University) CS510 Software Engineering 2015 15 / 35
How to Trace Source to Source Instrumentation Tracing by Source-Level Instrumentation Parse a source file into an AST. Annotate the AST with instrumentation. Translate the annotated trees into a new source file. Compile the new sources. Execute the program and produce a trace as side-effect. Mathias Payer (Purdue University) CS510 Software Engineering 2015 16 / 35
How to Trace Source to Source Instrumentation Source-Level Instrumentation Example 1 f o r ( i = 1 ; i < 1 0 ; i ++) { 2 a [ i ] = b [ i ] 5 ; 3 } for i 1 10 = [] * a i [] 5 b i Mathias Payer (Purdue University) CS510 Software Engineering 2015 17 / 35
How to Trace Source to Source Instrumentation Source-Level Instrumentation Example (2) 1 f o r ( i = 1 ; i < 1 0 ; i ++) { 2 p r i n t f ( I n l o o p \n ) ; 3 a [ i ] = b [ i ] 5 ; 4 } for i 1 10 ; printf = [] * a i [] 5 b i Mathias Payer (Purdue University) CS510 Software Engineering 2015 18 / 35
How to Trace Source to Source Instrumentation Characteristics of Source-Level Instrumentation Detailed type and variable information available. Detailed control-flow structures available. No support for pre-compiled libraries or binaries. Limited support for multi-lingual programs. Requires full source-code. Mathias Payer (Purdue University) CS510 Software Engineering 2015 19 / 35
How to Trace Binary Instrumentation Tracing by Binary Instrumentation Parse binary into intermediate representation, generate graph data structures like CFG. Instrument IR with tracing nodes. Compile/assemble back to an executable for static binary instrumentation or use a JIT to execute on-the-fly. Mathias Payer (Purdue University) CS510 Software Engineering 2015 20 / 35
How to Trace Binary Instrumentation Characteristics of Binary-Level Instrumentation No source-code needed. Supports libraries and any executable. Possibly high overhead due to instrumentation and translation. Limited scope and high-level data structures available. Mathias Payer (Purdue University) CS510 Software Engineering 2015 21 / 35
FastBT How to Trace FastBT, Generating Fast Binary Translators Enable fast, efficient instrumentation at low overhead. Instead of converting machine code to an IR, translate using pre-generated tables. Define a set of translation actions that add instrumentation when dispatched. Use a code-cache to lower overhead. Challenge: define translation actions for instructions that change control-flow. Mathias Payer (Purdue University) CS510 Software Engineering 2015 22 / 35
FastBT Overview How to Trace FastBT, Generating Fast Binary Translators Translator Translates individual basic blocks Verifies code source / destination Checks branch targets and origins Original code 1 2 3 4 R Mapping table 1 1' 2 2' 3 3'... Indirect control flow transfers use a dynamic check to verify target and origin Code cache 1' 2' 3' RX Reading material: Generating low-overhead dynamic binary translators, Mathias Payer and Thomas R. Gross, SySTOR 10 (see course homepage). Mathias Payer (Purdue University) CS510 Software Engineering 2015 23 / 35
Reducing Trace Size Table of Contents 1 Overview 2 DPA Primitives 3 Tracing definition 4 Use-cases for Tracing 5 How to Trace Source to Source Instrumentation Binary Instrumentation FastBT, Generating Fast Binary Translators 6 Reducing Trace Size Basic block-level Tracing Alternatives to Reduce Trace Size Compression Using Value Predictors Mathias Payer (Purdue University) CS510 Software Engineering 2015 24 / 35
Reducing Trace Size Fine-grained Tracing is Expensive! 1 i n t sum = 0 ; 2 i n t i = 1 ; 3 w h i l e ( i < N) { 4 i ++; 5 sum = sum + i ; 6 } 7 p r i n t f ( Sum : %d\n, sum ) ; Trace (N = 6): 1, 2, 3, 4, 5, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 4, 5, 6, 3, 7. Space complexity: exec length sizeof (void ) Mathias Payer (Purdue University) CS510 Software Engineering 2015 25 / 35
Reducing Trace Size Basic block-level Tracing Basic block-level Tracing 1 i n t sum = 0 ; 2 i n t i = 1 ; 3 w h i l e ( i < N) { 4 i ++; 5 sum = sum + i ; 6 } 7 p r i n t f ( Sum : %d\n, sum ) ; BB Trace: 1-2, 3, 4-5, 3, 4-5, 3, 4-5, 3, 4-5, 3, 4-5, 3, 7 In this example only 13/19 storage needed. Drawback: seeking inside basic block is more complicated. Mathias Payer (Purdue University) CS510 Software Engineering 2015 26 / 35
Reducing Trace Size Alternatives to Reduce Trace Size Other options to reduce trace size? Function-level tracing (i.e., recording functions and their parameters)(what about side-effects?) Predicate tracing (i.e., record all branch predicates from beginning of execution (needs only one bit per branch)(seeking is hard) Path-based tracing (record path through CFG)(Needs heavy-weight data structures) Compression using, e.g., deflate(relies on decompression, no seeking) Mathias Payer (Purdue University) CS510 Software Engineering 2015 27 / 35
Reducing Trace Size Compression Using Value Predictors Last n Values Predictor: Compression Buffer stores the last n unique encountered values. If the next value is one of the n values then the index into the buffer is emitted (prefixed with symbol 0). Otherwise (mis-prediction) store the encountered value to the encoded trace (prefixed with symbol m), update the buffer with a least used strategy. Example: 123 456 456 456 456 123 123 789 456 Use last-2 predictor: m 123 m 456 00 00 00 01 01 m 789 m 456 Mathias Payer (Purdue University) CS510 Software Engineering 2015 28 / 35
Reducing Trace Size Compression Using Value Predictors Last n Values Predictor: Decompression Take one bit from encoded trace. If m symbol then read next value and update buffer. If 0 symbol read index and print value from table. n-value Predictors are related to Run-Length Encoding (RLE). Mathias Payer (Purdue University) CS510 Software Engineering 2015 29 / 35
Reducing Trace Size Finite Context Method (FCM) Compression Using Value Predictors Construct a lookup-table that predicts a value based on the last n values (2-FCM, 3-FCM). If the next value is correctly predicted using the left context, a 0-bit is emitted to the encoded trace. Otherwise (mis-prediction), an m-symbol and the original value are emitted to the trace. The lookup-table is updated accordingly. Example (3-FCM): 1 2 3 4 5 3 4 5... 3 4 5 6 m 1 m 2 m 3 m 4 m 5 m 3 m 4 m 5 0... 0 0 0 m 6 Mathias Payer (Purdue University) CS510 Software Engineering 2015 30 / 35
Reducing Trace Size FCM Characteristics Compression Using Value Predictors Length (compressed): n/sizeof (void ) + n (1 predict rate). Predictors are better than deflate due to repetitive loop patterns. Drawback: trace is only forward traversable. Mathias Payer (Purdue University) CS510 Software Engineering 2015 31 / 35
Reducing Trace Size Bidirectional Compression Compression Using Value Predictors Use a small sliding window of clear text on the compressed string (just like with FCM) 1 Keep both left-context and right-context lookup table (instead of just left-context lookup table). Moving forward: decompress next value using left-context lookup table (sliding window is now n+1), compress the first value using the right-context lookup table (sliding window is now n again). Moving backward: decompress using right-context, compress using left-context. 1 The left and right side of the window stay compressed Mathias Payer (Purdue University) CS510 Software Engineering 2015 32 / 35
Reducing Trace Size Compression Using Value Predictors Bidirectional Compression: Example... A X Y 0... Left-context: A X Y: Z Compress right-context: X Y Z: A 2 2 If correct prediction, emit 0 to right-context stream, otherwise update table and emit m symbol Mathias Payer (Purdue University) CS510 Software Engineering 2015 33 / 35
Reducing Trace Size Compression Using Value Predictors Bidirectional Predictor Characteristics Almost same compression rate as unidirectional predictors. (Possibly slightly worse due to different prediction rate for forward/backward). Fast compression/decompression (two times slower than unidirectional predictors). Mathias Payer (Purdue University) CS510 Software Engineering 2015 34 / 35
Questions? Reducing Trace Size Compression Using Value Predictors? Mathias Payer (Purdue University) CS510 Software Engineering 2015 35 / 35