Programming Tools based on Big Data and Conditional Random Fields Veselin Raychev Martin Vechev Andreas Krause Department of Computer Science ETH Zurich Zurich Machine Learning and Data Science Meet-up, December 2014
Motivation Unprecedented access to massive codebases
Motivation ~16M repos ~ 7M users # of repositories year
Vision Statistical Programming Tools Probabilistically likely solutions to problems difficult or impossible to solve with traditional rule-based techniques
General Approach Find the right program representation for the task Find the right probabilistic model for the task Build a probabilistic model over the representation and existing code Use the probabilistic model to answer queries on new programs Programming languages + Machine learning
http://jsnice.org 1,000+ Tweets (sample below):
JSNice: Popularity one of the top ranked tools for JavaScript in 2014 30,000 users in 1 st week of release used in 180 countries
JSNice Intuition: Image Denoising Original image Noisy Image Denoised Image
Image Denoising Noisy Image? Denoised Image
JSNice function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n;? function chunkdata(str, step) var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.substring(i, i + step)); colnames.push(str.substring(i, len)); return colnames;
Structured Prediction for Programs (V. Raychev, M. Vechev, A. Krause, ACM POPL 15, to appear) Bridges Program Analysis and Conditional Random Fields First connection between programs and CRFs JSNice is a special instance CRFs a key model in Computer Vision
Markov Random Fields Undirected graphical model Graph + factors define a joint probability distribution t i 1 r P i, r, t = 1 i, t 2 i, r Z(i, r, t) 2 Captures dependence between facts to be predicted Undirected models better suited for our than directed models (direction is hard to capture) More on graphical models in: Probabilistic Graphical Models for Image Analysis, ETH graduate course, McWilliams and Lucchi
Conditional Random Fields (McCallum et.al, 2001) Some facts are already known, denoted as x We would like to predict new facts, y, conditioned on the known facts x t i 1 K r P i, r t = 1 i, t 2 i, r Z(t) 2 Key advantage of CRFs over MRFs: no priors required.
MAP inference: joint prediction y best = argmax P y x y x Key: MAP inference over marginals! This is key for programs i 1 t K P i, r t = r 2 1 i, t 2 i, r Z(t) i, r best = argmax 1 i, t 2 i, r (i, r) x We use an iterative greedy algorithm
Learning CRFs from Data (via max-margin training, Ratliff et.al., 2007) A convenient representation for learning from data is a log-linear CRF P y x = 1 Z(x) exp (wt f(y, x)) learned from data As we require only CRFs and MAP inference, we use the max-margin training due to Ratliff et.al. (2007). Computes subgradient via MAP inference. Avoids computation of Z(x)!
Learning CRFs from Data (via max-margin training, Ratliff et.al., 2007) A convenient representation for learning from data is a log-linear CRF P y x = 1 Z(x) exp (wt f(y, x)) learned from data y best = argmax w T f(y, x) y x As we require only CRFs and MAP inference, we use the max-margin training due to Ratliff et.al. (2007). Computes subgradient via MAP inference. Avoids computation of Z(x)!
Recipe: From a Program to a CRF
Recipe: From a Program to a CRF Step 1: Define the elements and their properties of interest Elements become nodes in a network, node content ranges over properties Example: elements are variables, properties are their type
Recipe: From a Program to a CRF Step 1: Define the elements and their properties of interest Elements become nodes in a network, node content ranges over properties Example: elements are variables, properties are their type Step 2: Define feature functions between elements Feature functions become undirected edges in the network Example: aliasing between variables, shared function caller, etc.
Recipe: From a Program to a CRF Step 1: Define the elements and their properties of interest Elements become nodes in a network, node content ranges over properties Example: elements are variables, properties are their type Step 2: Define feature functions between elements Feature functions become undirected edges in the network Example: aliasing between variables, shared function caller, etc. Step 3: Build network via static program Automatically extract nodes and feature functions from the program Example: alias, call graph Key point: general problem undecidable, need good approximations! More on Program Analysis: Program Analysis, ETH graduate course, M. Vechev, Spring 2015
function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; MAP inference
MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: t r i length
MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: t r i length i t r length
MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: t r i length i t w i step 0.5 j step 0.4 i t r length i t r length i r w i len 0.6 j length 0.3 r length w length length 0.5 len length 0.3 62
MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: t r i length i t w i step 0.5 j step 0.4 i t r length i t r length i r w i len 0.6 j length 0.3 r length w length length 0.5 len length 0.3 62
MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: i t r t r i length length i t w i step 0.5 j step 0.4 i i i r w i len 0.6 j length 0.3 t step r len length r length w length length 0.5 len length 0.3 62
MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; function chunkdata(str, step) var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.substring(i, i + step)); colnames.push(str.substring(i, len)); return colnames; Unknown properties: Known properties: i t r t r i length length i t w i step 0.5 j step 0.4 i i i r w i len 0.6 j length 0.3 t step r len length r length w length length 0.5 len length 0.3 62
Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs
Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs alias,call
Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs alias,call ~ 7M functions for names ~70K functions for type
Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs max-margin training alias,call ~ 7M functions for names ~70K functions for type
Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform ~ 150MB var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs max-margin training alias,call ~ 7M functions for names ~70K functions for type
Structured Prediction for Programs ~ 30 nodes, ~400 edges var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform ~ 150MB var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs max-margin training alias,call ~ 7M functions for names ~70K functions for type
Structured Prediction for Programs ~ 30 nodes, ~400 edges Time: milliseconds var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform ~ 150MB var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs max-margin training alias,call ~ 7M functions for names ~70K functions for type
Structured Prediction for Programs ~ 30 nodes, ~400 edges var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase Learning Phase program Time: milliseconds inference learn weighs transform ~ 150MB Learned Weights and Feature Functions max-margin training var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Names: 63% Types: 81% (helps typechecking) alias,call ~ 7M functions for names ~70K functions for type
Structured Prediction for Programs function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: i t r t r i length length function chunkdata(str, step) var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.substring(i, i + step)); Bridges Program Analysis and CRFs First application of CRFs to programs CRFs learned from data Fast and Precise colnames.push(str.substring(i, len)); return colnames; i t w i step 0.5 j step 0.4 i r w i len 0.6 j length 0.3 i i t step step r len length r length w length length 0.5 len length 0.3 62