Control-Flow Analysis CS5r Spring Includes (a lot of) material from slides for Principles of Program Analysis by Nielson, Nielson, and Hankin http://www.imm.dtu.dk/~riis/ppa/ppasup4.html
Outline What s the problem? -CFA Uniform k-cfa The k-cfa paradox Stephen Chong, Harvard University
What is control-flow analysis? Data-flow analysis relied on a control-flow graph How do we construct CFG? For intra-procedural analysis, relatively straightforward Identify basic blocks, control-flow structures We will not delve into this For inter-procedural analysis If functions/procedures are not first-class, relatively simple For languages with dynamic dispatch, it s harder Dynamic dispatch: which procedure/function gets invoked depends on runtime values Functional languages, OO, imperative languages with procedures as parameters,... Stephen Chong, Harvard University 3
CFA in higher-order languages We ll mostly focus today on control-flow analysis of functional languages For each function application, which functions may be applied? E.g. let f = fn x => x ; g = fn y => y+; h = fn z => z+3 in ( f g ) + ( f h ) Stephen Chong, Harvard University 4
Syntax of language e Exp expressions (or labelled terms) t Term terms (or unlabelled expressions) f, x Var variables c Const constants op Op binary operators Lab labels e ::= t t ::= c x fn x => e fun f x => e e e if e then e else e let x = e in e e op e Stephen Chong, Harvard University 5
Examples ((fn x => x ) (fn y => y 3 ) 4 ) 5 (let f = (fn x => (x ) 3 ) 4 ; in (let g = (fn y => y 5 ) 6 ; in (let h = (fn z => z 7 ) 8 in ((f 9 g ) + (f h 3 ) 4 ) 5 ) 6 ) 7 ) 8 (let g = (fun f x => (f (fn y => y ) 3 ) 4 ) 5 in (g 6 (fn z => z 7 ) 8 ) 9 ) Stephen Chong, Harvard University 6
-CFA -CFA is an context-insensitive CFA. Result of a -CFA analysis is pair (Ĉ, ^ρ) Ĉ is an abstract cache ^ρ is an abstract environment Actually, just functions v Val = P(Term) abstract values ρ Env = Var Val abstract environments Notes: C Cache = Lab Val abstract caches Could combine these into one entity: (Var Lab) ^Val Could also require A-normal form, where all subterms are appropriately labeled by variables Stephen Chong, Harvard University 7
Example ((fn x => x ) (fn y => y 3 ) 4 ) 5 3 4 5 x y ( C e, ρ e ) ( C e, ρ e) ( C e, ρ {fn y => y 3 } {fn x => x } {fn y => y 3 } {fn y => y 3 } {fn y => y 3 } {fn x => x } {fn y => y 3 } {fn y => y 3 } {fn y => y 3 } e ) {fn x => x, fn y => y 3 } {fn x => x, fn y => y 3 } {fn x => x, fn y => y 3 } {fn x => x, fn y => y 3 } {fn x => x, fn y => y 3 } {fn x => x, fn y => y 3 } {fn x => x, fn y => y 3 } Stephen Chong, Harvard University Acceptable Not acceptable Acceptable but less precise 8
Abstract specification What does it mean for (Ĉ, ^ρ) to be acceptable? Define relation indicating when (Ĉ, ^ρ) is acceptable -CFA of expression e ( C, ρ) = e = :( Cache Env Exp) {true, false} Stephen Chong, Harvard University 9
Abstract specification ( C, ρ) = c always ( C, ρ) = x iff ρ(x) C() ( C, ρ) = (let x = t in t iff ( C, ρ) = t ( C, ρ) = t C( ) ρ(x) C( ) C() ) Stephen Chong, Harvard University
Abstract specification ( C, ρ) = (if t then t else t iff ( C, ρ) = t ( C, ρ) = t ( C, ρ) = t C( ) C() C( ) C() ) ( C, ρ) = (t op t iff ( C, ρ) = t ( C, ρ) = t ) Stephen Chong, Harvard University
Abstract specification ( C, ρ) = (fn x => t ) iff {fn x => t } C() ( C, ρ) = (t t ) iff ( C, ρ) = t ( C, ρ) = t ( ( fn x => t ) C( ): ( C, ρ) = t C( ) ρ(x) C( ) C()) Stephen Chong, Harvard University
Abstract specification ( C, ρ) = (fun f x => e ) iff {fun f x => e } C() ( C, ρ) = (t t ) iff ( C, ρ) = t ( C, ρ) = t ( ( fn x => t ) C( ): ( C, ρ) = t C( ) ρ(x) C( ) C()) ( ( fun f x => t ) C( ): ( C, ρ) = t C( ) ρ(x) C( ) C() {fun f x => t } ρ(f) ) Stephen Chong, Harvard University 3
What s acceptable ((fn x => x ) (fn y => y 3 ) 4 ) 5 3 4 5 x y ( C e, ρ e ) ( C e, ρ e) {fn y => y 3 } {fn x => x } {fn y => y 3 } {fn y => y 3 } {fn y => y 3 } {fn y => y 3 } {fn x => x } {fn y => y 3 } {fn y => y 3 } ( C e, ρ e ) = ((fn x => x ) (fn y => y 3 ) 4 ) 5 ( C e, ρ e) = ((fn x => x ) (fn y => y 3 ) 4 ) 5 Stephen Chong, Harvard University 4
Abstract specification Note that we can t define by structural induction on expressions ( C, ρ) = (t t ) iff ( C, ρ) = t ( C, ρ) = t ( ( fn x => t ) C( ): ( C, ρ) = t C( ) ρ(x) C( ) C()) ( ( fun f x => t ) C( ): ( C, ρ) = t C( ) ρ(x) C( ) C() {fun f x => t } ρ(f) ) Instead, define coinductively Want the greatest fixed point that satisfies equations for Note: not an algorithm for solving, but a specification Stephen Chong, Harvard University 5
Semantic correctness Also need to show that acceptability of analysis results implies semantic correctness That is, ^C and ^p accurately describe the concrete execution. Like a type-soundness statement Stephen Chong, Harvard University 6
Syntax-directed -CFA Another formulation of -CFA that approximates the abstract specification i.e., Define s such that if (Ĉ, ^ρ) s e then (Ĉ, ^ρ) e Stephen Chong, Harvard University 7
Syntax-directed -CFA ( C, ρ) = s c always ( C, ρ) = s x iff ρ(x) C() ( C, ρ) = s (if t then t else t iff ( C, ρ) = s t ( C, ρ) = s t ( C, ρ) = s t C( ) C() C( ) C() ) ( C, ρ) = s (let x = t in t iff ( C, ρ) = s t ( C, ρ) = s t C( ) ρ(x) C( ) C() ) ( C, ρ) = s (t op t ) iff ( C, ρ) = s t ( C, ρ) = s t Stephen Chong, Harvard University 8
Syntax-directed -CFA ( C, ρ) = s (fn x => e ) iff {fn x => e } C() ( C, ρ) = s e ( C, ρ) = s (fun f x => e ) iff ( C, ρ) = s (t {fun f x => e } C() ( C, ρ) = s e {fun f x => e } ρ(f) t ) iff ( C, ρ) = s t ( C, ρ) = s t ( (fn x => t ) C( ): C( ) ρ(x) C( ) C() Note: may check some function bodies that aren t reachable, in return for enabling induction on syntax ( (fun f x => t ) C( ): C( ) ρ(x) C( ) C() ) ) Stephen Chong, Harvard University 9
Syntax-directed -CFA For any expression e, there is a least (Ĉ, ^ρ) such that (Ĉ, ^ρ) s e Can turn this syntax-directed -CFA specification into an equivalent algorithm that generates a set of constraints Least solution to set of constraints is least solution to syntax-directed -CFA Stephen Chong, Harvard University
Constraint-based -CFA C [[e ]] is a set of constraints of the form lhs rhs {t} rhs lhs rhs where rhs ::= C() r(x) lhs ::= C() r(x) {t} and all occurrences of t are of the form fn x => e or fun f x => e Stephen Chong, Harvard University
Constraint-based -CFA C [[c ]] = C [[x ]] = { r(x) C() } C [[(if t then t else t ) ]] = C [[t ]] C [[t ]] C [[t ]] { C( ) C() } { C( ) C() } C [[(let x = t in t ) ]] = C [[t ]] C [[t ]] { C( ) r(x) } { C( ) C() } C [[(t op t ) ]] = C [[t ]] C [[t ]] Stephen Chong, Harvard University
Constraint-based -CFA C [[(fn x => e ) ]] = { {fn x => e } C() } C [[e ]] C [[(fun f x => e ) ]] = { {fun f x => e } C() } C [[e ]] { {fun f x => e } r(f) } C [[(t t ) ]] = C [[t ]] C [[t ]] { {t} C( ) C( ) r(x) t =(fn x => t ) Term } { {t} C( ) C( ) C() t =(fn x => t ) Term } { {t} C( ) C( ) r(x) t =(fun f x => t ) Term } { {t} C( ) C( ) C() t =(fun f x => t ) Term } Stephen Chong, Harvard University 3
Example C [[((fn x => x ) (fn y => y 3 ) 4 ) 5 ]] = {{fn x => x } C(), r(x) C(), {fn y => y 3 } C(4), r(y) C(3), {fn x => x } C() C(4) r(x), {fn x => x } C() C() C(5), {fn y => y 3 } C() C(4) r(y), {fn y => y 3 } C() C(3) C(5) } Stephen Chong, Harvard University 4
Correctness Translating syntactic entities to sets of terms: ( C, ρ)[[c()]] = C() ( C, ρ)[[r(x)]] = ρ(x) ( C, ρ)[[{t}]] = {t} Satisfaction relation for constraints: ( C, ρ) = c (lhs rhs) ( C, ρ) = c (lhs rhs) iff ( C, ρ)[[lhs]] ( C, ρ)[[rhs]] ( C, ρ) = c ({t} rhs lhs rhs) iff ({t} ( C, ρ)[[rhs ]] ( C, ρ)[[lhs]] ( C, ρ)[[rhs]]) ({t} ( C, ρ)[[rhs ]]) Proposition: ( C, ρ) = s e if and only if ( C, ρ) = c C [[e ]]. Stephen Chong, Harvard University 5
Adding data-flow analysis Current domain equations Actually, just functions v Val = P(Term) abstract values ρ Env = Var Val abstract environments C Cache = Lab Val abstract caches Idea: extend abstract values to include other things than just functions E.g., let Data be set of abstract data values e.g., {tt, ff, -,, +} v Val d = P(Term Data) abstract values Stephen Chong, Harvard University 6
Abstract data values For each constant c, need abstract data value d c For each operator op need abstract operator op : Data Data P(Data) Data sign = {tt, ff, -,, +} d true =tt d 7 = + + is defined from d + tt ff - + tt ff - {-} {-} {-,, +} {-} {} {+} + {-,, +} {+} {+} Stephen Chong, Harvard University 7
Data-flow and control-flow spec ( C, ρ) = d c iff {d c } C() ( C, ρ) = d x iff ρ(x) C() ( C, ρ) = d (if t then t else t iff ( C, ρ) = d t Is flow sensitive: can determine whether true or false branches can be taken ) (d true C( ) ( ( C, ρ) = d t C( ) C())) (d false C( ) ( ( C, ρ) = d t C( ) C())) ( C, ρ) = d (let x = t in t iff ( C, ρ) = d t ( C, ρ) = d t C( ) ρ(x) C( ) C() ( C, ρ) = d (t op t ) ) iff ( C, ρ) = d t ( C, ρ) = d t C( ) op C( ) C() Stephen Chong, Harvard University 8
Arbitrary lattices ^Val = P(Term Data) = P(Term) P(Data) Could also use an arbitrary lattice ^Val = P(Term Data) = P(Term) L Stephen Chong, Harvard University 9
Adding contexts -CFA is a context-insensitive (or mono-variant) analysis Does not distinguish various instances of program variables and program points from each other Context-sensitive (or poly-variant) analysis does distingtuish Stephen Chong, Harvard University 3
Uniform k-cfa v Val = P(Term) abstract values ρ Env = Var Val abstract environments C Cache = Lab Val abstract caches Idea: extend ^Val to include context information Contexts δ will record last k dynamic call-sites δ = Lab k context information Definition point of free variables of terms ce CEnv = Var context environments v Val = P(Term CEnv) abstract values ρ Env = (Var ) Val abstract environments Called uniform because both environment and cache use same precision Stephen Chong, Harvard University C Cache = (Lab ) Val abstract caches 3
Acceptability relation ( C, ρ) = ce δ e ce is current context environment i.e., for free variables of e, in which context were they bound? Changes as variables are bound δ is current context Changes as functions are applied Stephen Chong, Harvard University 3
Acceptability relation ( C, ρ) = ce δ c always ( C, ρ) = ce δ x iff ρ(x, ce(x) ) C(, δ) ( C, ρ) = ce δ (if t then t else t ) iff ( C, ρ) = ce δ t ( C, ρ) = ce δ t ( C, ρ) = ce C(, δ) C(, δ) C(, δ) C(, δ) δ t ( C, ρ) = ce δ (let x = t in t ) iff ( C, ρ) = ce δ t ( C, ρ) = ce δ t C(, δ) ρ(x, δ) C(, δ) C(, δ) where ce = ce[x δ] ( C, ρ) = ce δ (t op t ) iff ( C, ρ) = ce δ t ( C, ρ) = ce δ t Stephen Chong, Harvard University 33
Acceptability relation ( C, ρ) = ce δ (fn x => e ) iff {(fn x => e, ce )} C(, δ) ( C, ρ) = ce δ (fun f x => e ) iff {(fun f x => e, ce )} C(, δ) ( C, ρ) = ce δ (t iff t ) ( C, ρ) = ce δ t ( C, ρ) = ce δ t ( (fn x => t, ce ) C(, δ) : ( C, ρ) = ce δ t C(, δ) ρ(x, δ ) C(, δ ) C(, δ) where δ = δ, k and ce = ce [x δ ]) ( (fun f x => t, ce ) C(, δ) : ( C, ρ) = ce δ t C(, δ) ρ(x, δ ) C(, δ ) C(, δ) {(fun f x => t, ce )} ρ(f,δ ) where δ = δ, k and ce = ce [f δ,x δ ]) Stephen Chong, Harvard University 34
Example (let f = (fn x => x ) in ((f 3 f 4 ) 5 (fn y => y 6 ) 7 ) 8 ) 9 Contexts of interest for uniform -CFA Λ: the initial context 5: the context when the application point labelled 5 has been passed 8: the context when the application point labelled 8 has been passed Context environments of interest for uniform -CFA ce = [ ] ce = ce [f Λ] ce = ce [x 5] ce 3 = ce [x 8] the initial (empty) context environment the context environment for the analysis of the body of the let-construct the context environment used for the analysis of the body of f initiated at the application point 5 the context environment used for the analysis of the body of f initiated at the application point 8. Stephen Chong, Harvard University 35
Example C id (, 5) = {(fn x => x,ce )} C id (, 8) = {(fn y => y 6,ce )} C id (, Λ) ={(fn x => x,ce )} C id (3, Λ) ={(fn x => x,ce )} C id (4, Λ) ={(fn x => x,ce )} C id (5, Λ) ={(fn x => x,ce )} C id (7, Λ) ={(fn y => y 6,ce )} C id (8, Λ) ={(fn y => y 6,ce )} C id (9, Λ) ={(fn y => y 6,ce )} ρ id (f, Λ) ={(fn x => x,ce )} ρ id (x, 5) = {(fn x => x,ce )} ρ id (x, 8) = {(fn y => y 6,ce )} This is an acceptable analysis result: ( C id, ρ id ) = ce Λ (let f = (fn x => x ) in ((f 3 f 4 ) 5 (fn y => y 6 ) 7 ) 8 ) 9 Stephen Chong, Harvard University 36
Complexity δ = Lab k context information ce CEnv = Var context environments v Val = P(Term CEnv) abstract values ρ Env = (Var ) Val abstract environments C Cache = (Lab ) Val abstract caches k-cfa has worst-case exponential complexity in size of program Size n program, p variables Δ has O(n) elements Size of CEnv is O(n p ) ^Val is powerset of pairs (t, ce), and there are O(n n p ) pairs, so Val has height O(n n p ) p = O(n) -CFA has worst-case polynomial complexity Stephen Chong, Harvard University 37
Variations on k-cfa Uniform k-cfa ce CEnv = Var context environments v Val = P(Term CEnv) abstract values ρ Env = (Var ) Val abstract environments C Cache = (Lab ) Val abstract caches k-cfa C Cache = (Lab CEnv) Val abstract caches Polynomial k-cfa v Val = P(Term ) abstract values Stephen Chong, Harvard University 38
k-cfa Paradox [Might, Smaragdakis, van Horn, PLDI ] k-cfa is exponential for k But k-cfa is like using context of k most recent call-sites Polynomial for OO languages Doop implemented in Datalog, which only allows polynomial time alogrithms OO has dynamic dispatch What gives? Stephen Chong, Harvard University 39
Wait, which k-cfa? In OO world, translate k-cfa to k-call-site sensitive interprocedural pointer analysis with a k-context-sensitive heap and onthe-fly call-graph construction i.e., data flow (points-to relation) and call-graph dependent on each other Is it the same analysis? Yes. And paradox still holds. Stephen Chong, Harvard University 4
Paradox resolved In functional languages, closures are created incrementally Each variable in a closure could be bound in a different context Source of exponentiallity In OO languages, closures created explicitly by invoking constructor Variables are copied, and so effectively all variables bound in same context CEnv = Δ instead of Var Δ δ = Lab k context information ce CEnv = Var context environments v Val = P(Term CEnv) abstract values ρ Env = (Var ) Val abstract environments C Cache = (Lab ) Val abstract caches Stephen Chong, Harvard University 4
Example OO program caller() { foo(ox);... foo(oxn); } + foo(object x) { ClosureX cx = new ClosureX(x); cx.bar(oy);... cx.bar(oym); } + class ClosureX { Object x; ClosureX(Object x) { class ClosureXY { x = x; } // constructor } bar(object y) { ClosureXY cxy = new ClosureXY(x,y); cxy.baz( );... } } Object x,y; ClosureXY(Object x, Object y) { x = x; y = y; } // constructor baz( ) {... x... y... } Equivalent functional program caller() { foo(ox);! foo(object x) {... Closure cx = foo(oxn); } cx(oy);... cx(oym); }! lambda(object y) { Closure cxy = } cxy( );... lambda( ) {... x... y... } Stephen Chong, Harvard University 4
m-cfa From this insight, Might, Smaragdakis and Van Horn develop m-cfa Contexts are the top m stack frames (Different from last k call sites when in continuationpassing style) Essentially CEnv = Δ instead of Var Δ Polynomial-time analysis, seems as precise a k- CFA for significantly less time Stephen Chong, Harvard University 43