Programming Tools based on Big Data and Conditional Random Fields

Transcription

1 Programming Tools based on Big Data and Conditional Random Fields Veselin Raychev Martin Vechev Andreas Krause Department of Computer Science ETH Zurich Zurich Machine Learning and Data Science Meet-up, December 2014

2 Motivation Unprecedented access to massive codebases

3 Motivation ~16M repos ~ 7M users # of repositories year

4 Vision Statistical Programming Tools Probabilistically likely solutions to problems difficult or impossible to solve with traditional rule-based techniques

5 General Approach Find the right program representation for the task Find the right probabilistic model for the task Build a probabilistic model over the representation and existing code Use the probabilistic model to answer queries on new programs Programming languages + Machine learning

6 1,000+ Tweets (sample below):

7 JSNice: Popularity one of the top ranked tools for JavaScript in ,000 users in 1 st week of release used in 180 countries

8 JSNice Intuition: Image Denoising Original image Noisy Image Denoised Image

9 Image Denoising Noisy Image? Denoised Image

10 JSNice function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n;? function chunkdata(str, step) var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.substring(i, i + step)); colnames.push(str.substring(i, len)); return colnames;

11 Structured Prediction for Programs (V. Raychev, M. Vechev, A. Krause, ACM POPL 15, to appear) Bridges Program Analysis and Conditional Random Fields First connection between programs and CRFs JSNice is a special instance CRFs a key model in Computer Vision

12 Markov Random Fields Undirected graphical model Graph + factors define a joint probability distribution t i 1 r P i, r, t = 1 i, t 2 i, r Z(i, r, t) 2 Captures dependence between facts to be predicted Undirected models better suited for our than directed models (direction is hard to capture) More on graphical models in: Probabilistic Graphical Models for Image Analysis, ETH graduate course, McWilliams and Lucchi

13 Conditional Random Fields (McCallum et.al, 2001) Some facts are already known, denoted as x We would like to predict new facts, y, conditioned on the known facts x t i 1 K r P i, r t = 1 i, t 2 i, r Z(t) 2 Key advantage of CRFs over MRFs: no priors required.

14 MAP inference: joint prediction y best = argmax P y x y x Key: MAP inference over marginals! This is key for programs i 1 t K P i, r t = r 2 1 i, t 2 i, r Z(t) i, r best = argmax 1 i, t 2 i, r (i, r) x We use an iterative greedy algorithm

15 Learning CRFs from Data (via max-margin training, Ratliff et.al., 2007) A convenient representation for learning from data is a log-linear CRF P y x = 1 Z(x) exp (wt f(y, x)) learned from data As we require only CRFs and MAP inference, we use the max-margin training due to Ratliff et.al. (2007). Computes subgradient via MAP inference. Avoids computation of Z(x)!

16 Learning CRFs from Data (via max-margin training, Ratliff et.al., 2007) A convenient representation for learning from data is a log-linear CRF P y x = 1 Z(x) exp (wt f(y, x)) learned from data y best = argmax w T f(y, x) y x As we require only CRFs and MAP inference, we use the max-margin training due to Ratliff et.al. (2007). Computes subgradient via MAP inference. Avoids computation of Z(x)!

17 Recipe: From a Program to a CRF

18 Recipe: From a Program to a CRF Step 1: Define the elements and their properties of interest Elements become nodes in a network, node content ranges over properties Example: elements are variables, properties are their type

19 Recipe: From a Program to a CRF Step 1: Define the elements and their properties of interest Elements become nodes in a network, node content ranges over properties Example: elements are variables, properties are their type Step 2: Define feature functions between elements Feature functions become undirected edges in the network Example: aliasing between variables, shared function caller, etc.

20 Recipe: From a Program to a CRF Step 1: Define the elements and their properties of interest Elements become nodes in a network, node content ranges over properties Example: elements are variables, properties are their type Step 2: Define feature functions between elements Feature functions become undirected edges in the network Example: aliasing between variables, shared function caller, etc. Step 3: Build network via static program Automatically extract nodes and feature functions from the program Example: alias, call graph Key point: general problem undecidable, need good approximations! More on Program Analysis: Program Analysis, ETH graduate course, M. Vechev, Spring 2015

21 function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; MAP inference

22 MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: t r i length

23 MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: t r i length i t r length

24 MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: t r i length i t w i step 0.5 j step 0.4 i t r length i t r length i r w i len 0.6 j length 0.3 r length w length length 0.5 len length

25 MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: t r i length i t w i step 0.5 j step 0.4 i t r length i t r length i r w i len 0.6 j length 0.3 r length w length length 0.5 len length

26 MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: i t r t r i length length i t w i step 0.5 j step 0.4 i i i r w i len 0.6 j length 0.3 t step r len length r length w length length 0.5 len length

27 MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; function chunkdata(str, step) var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.substring(i, i + step)); colnames.push(str.substring(i, len)); return colnames; Unknown properties: Known properties: i t r t r i length length i t w i step 0.5 j step 0.4 i i i r w i len 0.6 j length 0.3 t step r len length r length w length length 0.5 len length

28 Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs

29 Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs alias,call

30 Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs alias,call ~ 7M functions for names ~70K functions for type

31 Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs max-margin training alias,call ~ 7M functions for names ~70K functions for type

32 Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform ~ 150MB var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs max-margin training alias,call ~ 7M functions for names ~70K functions for type

33 Structured Prediction for Programs ~ 30 nodes, ~400 edges var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform ~ 150MB var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs max-margin training alias,call ~ 7M functions for names ~70K functions for type

34 Structured Prediction for Programs ~ 30 nodes, ~400 edges Time: milliseconds var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform ~ 150MB var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs max-margin training alias,call ~ 7M functions for names ~70K functions for type

35 Structured Prediction for Programs ~ 30 nodes, ~400 edges var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase Learning Phase program Time: milliseconds inference learn weighs transform ~ 150MB Learned Weights and Feature Functions max-margin training var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Names: 63% Types: 81% (helps typechecking) alias,call ~ 7M functions for names ~70K functions for type

36 Structured Prediction for Programs function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: i t r t r i length length function chunkdata(str, step) var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.substring(i, i + step)); Bridges Program Analysis and CRFs First application of CRFs to programs CRFs learned from data Fast and Precise colnames.push(str.substring(i, len)); return colnames; i t w i step 0.5 j step 0.4 i r w i len 0.6 j length 0.3 i i t step step r len length r length w length length 0.5 len length