Programming Tools based on Big Data and Conditional Random Fields

Similar documents

Predicting Program Properties from Big Code

Distributed Structured Prediction for Big Data

The Basics of Graphical Models

Course: Model, Learning, and Inference: Lecture 5

Conditional Random Fields: An Introduction

An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data

Introduction to Deep Learning Variational Inference, Mean Field Theory

Travis Goodwin & Sanda Harabagiu

Structured Learning and Prediction in Computer Vision. Contents

HIGH PERFORMANCE BIG DATA ANALYTICS

Big Data Science. Prof. Lise Getoor University of Maryland, College Park. October 17, 2013

Sanjeev Kumar. contribute

Distance Degree Sequences for Network Analysis

Chapter 28. Bayesian Networks

Various applications of restricted Boltzmann machines for bad quality training data

Bayesian networks - Time-series models - Apache Spark & Scala

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1

Life of A Knowledge Base (KB)

Enhanced Information Access to Social Streams. Enhanced Word Clouds with Entity Grouping

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

How Conditional Random Fields Learn Dynamics: An Example-Based Study

Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC

How To Find An Alias On From A Computer (For A Free Download)

Social Media Mining. Data Mining Essentials

Supervised Learning (Big Data Analytics)

Data Mining Practical Machine Learning Tools and Techniques

1 An Introduction to Conditional Random Fields for Relational Learning

A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

A Learning Based Method for Super-Resolution of Low Resolution Images

An Interactive Visualization Tool for Nipype Medical Image Computing Pipelines

Automated Model Based Testing for an Web Applications

Lecture 2: Complexity Theory Review and Interactive Proofs

Machine Learning over Big Data

Journal of Machine Learning Research 1 (2013) 1-1 Submitted 8/13; Published 10/13

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

3. The Junction Tree Algorithms

Finding the M Most Probable Configurations Using Loopy Belief Propagation

Search engines: ranking algorithms

Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics

Bayesian Network Development

Efficient Identification of Starters and Followers in Social Media

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Graphical Modeling for Genomic Data

Bayesian Networks. Mausam (Slides by UW-AI faculty)

NEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES

Multiple Network Marketing coordination Model

Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu

Topic models for Sentiment analysis: A Literature Survey

A survey on click modeling in web search

Hidden Markov Models Chapter 15

How To Classify Objects From 3D Data On A Robot

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Berkeley CS191x: Quantum Mechanics and Quantum Computation Optional Class Project

Sentiment Analysis on Big Data

Statistical machine learning, high dimension and big data

Big Data & Scripting Part II Streaming Algorithms

VisCG: Creating an Eclipse Call Graph Visualization Plug-in. Kenta Hasui, Undergraduate Student at Vassar College Class of 2015

A Review of Data Mining Techniques

Medial Axis Construction and Applications in 3D Wireless Sensor Networks

Practical Graph Mining with R. 5. Link Analysis

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

Optimizations. Optimization Safety. Optimization Safety. Control Flow Graphs. Code transformations to improve program

Dynamical Clustering of Personalized Web Search Results

How does the Kinect work? John MacCormick

Change Impact Analysis

Detecting Anomalies in Network Traffic Using Maximum Entropy Estimation

Compact Representations and Approximations for Compuation in Games

MapReduce Approach to Collective Classification for Networks

Common Patterns and Pitfalls for Implementing Algorithms in Spark. Hossein

Approximating the Partition Function by Deleting and then Correcting for Model Edges

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

D-optimal plans in observational studies

Assignment 5: Visualization

Research Statement Immanuel Trummer

Big Graph Processing: Some Background

Traffic Driven Analysis of Cellular Data Networks

Sense Making in an IOT World: Sensor Data Analysis with Deep Learning

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Clustering Technique in Data Mining for Text Documents

DATA ANALYSIS II. Matrix Algorithms

Reputation Network Analysis for Filtering

Cell Phone based Activity Detection using Markov Logic Network

Transcription:

Programming Tools based on Big Data and Conditional Random Fields Veselin Raychev Martin Vechev Andreas Krause Department of Computer Science ETH Zurich Zurich Machine Learning and Data Science Meet-up, December 2014

Motivation Unprecedented access to massive codebases

Motivation ~16M repos ~ 7M users # of repositories year

Vision Statistical Programming Tools Probabilistically likely solutions to problems difficult or impossible to solve with traditional rule-based techniques

General Approach Find the right program representation for the task Find the right probabilistic model for the task Build a probabilistic model over the representation and existing code Use the probabilistic model to answer queries on new programs Programming languages + Machine learning

http://jsnice.org 1,000+ Tweets (sample below):

JSNice: Popularity one of the top ranked tools for JavaScript in 2014 30,000 users in 1 st week of release used in 180 countries

JSNice Intuition: Image Denoising Original image Noisy Image Denoised Image

Image Denoising Noisy Image? Denoised Image

JSNice function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n;? function chunkdata(str, step) var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.substring(i, i + step)); colnames.push(str.substring(i, len)); return colnames;

Structured Prediction for Programs (V. Raychev, M. Vechev, A. Krause, ACM POPL 15, to appear) Bridges Program Analysis and Conditional Random Fields First connection between programs and CRFs JSNice is a special instance CRFs a key model in Computer Vision

Markov Random Fields Undirected graphical model Graph + factors define a joint probability distribution t i 1 r P i, r, t = 1 i, t 2 i, r Z(i, r, t) 2 Captures dependence between facts to be predicted Undirected models better suited for our than directed models (direction is hard to capture) More on graphical models in: Probabilistic Graphical Models for Image Analysis, ETH graduate course, McWilliams and Lucchi

Conditional Random Fields (McCallum et.al, 2001) Some facts are already known, denoted as x We would like to predict new facts, y, conditioned on the known facts x t i 1 K r P i, r t = 1 i, t 2 i, r Z(t) 2 Key advantage of CRFs over MRFs: no priors required.

MAP inference: joint prediction y best = argmax P y x y x Key: MAP inference over marginals! This is key for programs i 1 t K P i, r t = r 2 1 i, t 2 i, r Z(t) i, r best = argmax 1 i, t 2 i, r (i, r) x We use an iterative greedy algorithm

Learning CRFs from Data (via max-margin training, Ratliff et.al., 2007) A convenient representation for learning from data is a log-linear CRF P y x = 1 Z(x) exp (wt f(y, x)) learned from data As we require only CRFs and MAP inference, we use the max-margin training due to Ratliff et.al. (2007). Computes subgradient via MAP inference. Avoids computation of Z(x)!

Learning CRFs from Data (via max-margin training, Ratliff et.al., 2007) A convenient representation for learning from data is a log-linear CRF P y x = 1 Z(x) exp (wt f(y, x)) learned from data y best = argmax w T f(y, x) y x As we require only CRFs and MAP inference, we use the max-margin training due to Ratliff et.al. (2007). Computes subgradient via MAP inference. Avoids computation of Z(x)!

Recipe: From a Program to a CRF

Recipe: From a Program to a CRF Step 1: Define the elements and their properties of interest Elements become nodes in a network, node content ranges over properties Example: elements are variables, properties are their type

Recipe: From a Program to a CRF Step 1: Define the elements and their properties of interest Elements become nodes in a network, node content ranges over properties Example: elements are variables, properties are their type Step 2: Define feature functions between elements Feature functions become undirected edges in the network Example: aliasing between variables, shared function caller, etc.

Recipe: From a Program to a CRF Step 1: Define the elements and their properties of interest Elements become nodes in a network, node content ranges over properties Example: elements are variables, properties are their type Step 2: Define feature functions between elements Feature functions become undirected edges in the network Example: aliasing between variables, shared function caller, etc. Step 3: Build network via static program Automatically extract nodes and feature functions from the program Example: alias, call graph Key point: general problem undecidable, need good approximations! More on Program Analysis: Program Analysis, ETH graduate course, M. Vechev, Spring 2015

function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; MAP inference

MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: t r i length

MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: t r i length i t r length

MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: t r i length i t w i step 0.5 j step 0.4 i t r length i t r length i r w i len 0.6 j length 0.3 r length w length length 0.5 len length 0.3 62

MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: t r i length i t w i step 0.5 j step 0.4 i t r length i t r length i r w i len 0.6 j length 0.3 r length w length length 0.5 len length 0.3 62

MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: i t r t r i length length i t w i step 0.5 j step 0.4 i i i r w i len 0.6 j length 0.3 t step r len length r length w length length 0.5 len length 0.3 62

MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; function chunkdata(str, step) var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.substring(i, i + step)); colnames.push(str.substring(i, len)); return colnames; Unknown properties: Known properties: i t r t r i length length i t w i step 0.5 j step 0.4 i i i r w i len 0.6 j length 0.3 t step r len length r length w length length 0.5 len length 0.3 62

Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs

Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs alias,call

Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs alias,call ~ 7M functions for names ~70K functions for type

Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs max-margin training alias,call ~ 7M functions for names ~70K functions for type

Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform ~ 150MB var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs max-margin training alias,call ~ 7M functions for names ~70K functions for type

Structured Prediction for Programs ~ 30 nodes, ~400 edges var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform ~ 150MB var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs max-margin training alias,call ~ 7M functions for names ~70K functions for type

Structured Prediction for Programs ~ 30 nodes, ~400 edges Time: milliseconds var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform ~ 150MB var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs max-margin training alias,call ~ 7M functions for names ~70K functions for type

Structured Prediction for Programs ~ 30 nodes, ~400 edges var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase Learning Phase program Time: milliseconds inference learn weighs transform ~ 150MB Learned Weights and Feature Functions max-margin training var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Names: 63% Types: 81% (helps typechecking) alias,call ~ 7M functions for names ~70K functions for type

Structured Prediction for Programs function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: i t r t r i length length function chunkdata(str, step) var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.substring(i, i + step)); Bridges Program Analysis and CRFs First application of CRFs to programs CRFs learned from data Fast and Precise colnames.push(str.substring(i, len)); return colnames; i t w i step 0.5 j step 0.4 i r w i len 0.6 j length 0.3 i i t step step r len length r length w length length 0.5 len length 0.3 62