Programming Tools based on Big Data and Conditional Random Fields


 Brett Brooks
 1 years ago
 Views:
Transcription
1 Programming Tools based on Big Data and Conditional Random Fields Veselin Raychev Martin Vechev Andreas Krause Department of Computer Science ETH Zurich Zurich Machine Learning and Data Science Meetup, December 2014
2 Motivation Unprecedented access to massive codebases
3 Motivation ~16M repos ~ 7M users # of repositories year
4 Vision Statistical Programming Tools Probabilistically likely solutions to problems difficult or impossible to solve with traditional rulebased techniques
5 General Approach Find the right program representation for the task Find the right probabilistic model for the task Build a probabilistic model over the representation and existing code Use the probabilistic model to answer queries on new programs Programming languages + Machine learning
6 1,000+ Tweets (sample below):
7 JSNice: Popularity one of the top ranked tools for JavaScript in ,000 users in 1 st week of release used in 180 countries
8 JSNice Intuition: Image Denoising Original image Noisy Image Denoised Image
9 Image Denoising Noisy Image? Denoised Image
10 JSNice function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n;? function chunkdata(str, step) var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.substring(i, i + step)); colnames.push(str.substring(i, len)); return colnames;
11 Structured Prediction for Programs (V. Raychev, M. Vechev, A. Krause, ACM POPL 15, to appear) Bridges Program Analysis and Conditional Random Fields First connection between programs and CRFs JSNice is a special instance CRFs a key model in Computer Vision
12 Markov Random Fields Undirected graphical model Graph + factors define a joint probability distribution t i 1 r P i, r, t = 1 i, t 2 i, r Z(i, r, t) 2 Captures dependence between facts to be predicted Undirected models better suited for our than directed models (direction is hard to capture) More on graphical models in: Probabilistic Graphical Models for Image Analysis, ETH graduate course, McWilliams and Lucchi
13 Conditional Random Fields (McCallum et.al, 2001) Some facts are already known, denoted as x We would like to predict new facts, y, conditioned on the known facts x t i 1 K r P i, r t = 1 i, t 2 i, r Z(t) 2 Key advantage of CRFs over MRFs: no priors required.
14 MAP inference: joint prediction y best = argmax P y x y x Key: MAP inference over marginals! This is key for programs i 1 t K P i, r t = r 2 1 i, t 2 i, r Z(t) i, r best = argmax 1 i, t 2 i, r (i, r) x We use an iterative greedy algorithm
15 Learning CRFs from Data (via maxmargin training, Ratliff et.al., 2007) A convenient representation for learning from data is a loglinear CRF P y x = 1 Z(x) exp (wt f(y, x)) learned from data As we require only CRFs and MAP inference, we use the maxmargin training due to Ratliff et.al. (2007). Computes subgradient via MAP inference. Avoids computation of Z(x)!
16 Learning CRFs from Data (via maxmargin training, Ratliff et.al., 2007) A convenient representation for learning from data is a loglinear CRF P y x = 1 Z(x) exp (wt f(y, x)) learned from data y best = argmax w T f(y, x) y x As we require only CRFs and MAP inference, we use the maxmargin training due to Ratliff et.al. (2007). Computes subgradient via MAP inference. Avoids computation of Z(x)!
17 Recipe: From a Program to a CRF
18 Recipe: From a Program to a CRF Step 1: Define the elements and their properties of interest Elements become nodes in a network, node content ranges over properties Example: elements are variables, properties are their type
19 Recipe: From a Program to a CRF Step 1: Define the elements and their properties of interest Elements become nodes in a network, node content ranges over properties Example: elements are variables, properties are their type Step 2: Define feature functions between elements Feature functions become undirected edges in the network Example: aliasing between variables, shared function caller, etc.
20 Recipe: From a Program to a CRF Step 1: Define the elements and their properties of interest Elements become nodes in a network, node content ranges over properties Example: elements are variables, properties are their type Step 2: Define feature functions between elements Feature functions become undirected edges in the network Example: aliasing between variables, shared function caller, etc. Step 3: Build network via static program Automatically extract nodes and feature functions from the program Example: alias, call graph Key point: general problem undecidable, need good approximations! More on Program Analysis: Program Analysis, ETH graduate course, M. Vechev, Spring 2015
21 function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; MAP inference
22 MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: t r i length
23 MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: t r i length i t r length
24 MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: t r i length i t w i step 0.5 j step 0.4 i t r length i t r length i r w i len 0.6 j length 0.3 r length w length length 0.5 len length
25 MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: t r i length i t w i step 0.5 j step 0.4 i t r length i t r length i r w i len 0.6 j length 0.3 r length w length length 0.5 len length
26 MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: i t r t r i length length i t w i step 0.5 j step 0.4 i i i r w i len 0.6 j length 0.3 t step r len length r length w length length 0.5 len length
27 MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; function chunkdata(str, step) var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.substring(i, i + step)); colnames.push(str.substring(i, len)); return colnames; Unknown properties: Known properties: i t r t r i length length i t w i step 0.5 j step 0.4 i i i r w i len 0.6 j length 0.3 t step r len length r length w length length 0.5 len length
28 Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs
29 Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs alias,call
30 Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs alias,call ~ 7M functions for names ~70K functions for type
31 Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs maxmargin training alias,call ~ 7M functions for names ~70K functions for type
32 Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform ~ 150MB var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs maxmargin training alias,call ~ 7M functions for names ~70K functions for type
33 Structured Prediction for Programs ~ 30 nodes, ~400 edges var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform ~ 150MB var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs maxmargin training alias,call ~ 7M functions for names ~70K functions for type
34 Structured Prediction for Programs ~ 30 nodes, ~400 edges Time: milliseconds var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform ~ 150MB var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs maxmargin training alias,call ~ 7M functions for names ~70K functions for type
35 Structured Prediction for Programs ~ 30 nodes, ~400 edges var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase Learning Phase program Time: milliseconds inference learn weighs transform ~ 150MB Learned Weights and Feature Functions maxmargin training var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Names: 63% Types: 81% (helps typechecking) alias,call ~ 7M functions for names ~70K functions for type
36 Structured Prediction for Programs function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: i t r t r i length length function chunkdata(str, step) var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.substring(i, i + step)); Bridges Program Analysis and CRFs First application of CRFs to programs CRFs learned from data Fast and Precise colnames.push(str.substring(i, len)); return colnames; i t w i step 0.5 j step 0.4 i r w i len 0.6 j length 0.3 i i t step step r len length r length w length length 0.5 len length
Predicting Program Properties from Big Code
Predicting Program Properties from Big Code * POPL * Artifact Consistent * Complete * Well Documented * Easy to Reuse * Evaluated * AEC * Veselin Raychev Department of Computer Science ETH Zürich veselin.raychev@inf.ethz.ch
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Raquel Urtasun and Tamir Hazan TTI Chicago April 4, 2011 Raquel Urtasun and Tamir Hazan (TTIC) Graphical Models April 4, 2011 1 / 22 Bayesian Networks and independences
More informationDistributed Structured Prediction for Big Data
Distributed Structured Prediction for Big Data A. G. Schwing ETH Zurich aschwing@inf.ethz.ch T. Hazan TTI Chicago M. Pollefeys ETH Zurich R. Urtasun TTI Chicago Abstract The biggest limitations of learning
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More informationConditional Random Fields: An Introduction
Conditional Random Fields: An Introduction Hanna M. Wallach February 24, 2004 1 Labeling Sequential Data The task of assigning label sequences to a set of observation sequences arises in many fields, including
More informationAn Introduction to the Use of Bayesian Network to Analyze Gene Expression Data
n Introduction to the Use of ayesian Network to nalyze Gene Expression Data Cristina Manfredotti Dipartimento di Informatica, Sistemistica e Comunicazione (D.I.S.Co. Università degli Studi Milanoicocca
More informationIntroduction to Deep Learning Variational Inference, Mean Field Theory
Introduction to Deep Learning Variational Inference, Mean Field Theory 1 Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Center for Visual Computing Ecole Centrale Paris Galen Group INRIASaclay Lecture 3: recap
More informationCS 188: Artificial Intelligence. Probability recap
CS 188: Artificial Intelligence Bayes Nets Representation and Independence Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Conditional probability
More informationTravis Goodwin & Sanda Harabagiu
Automatic Generation of a Qualified Medical Knowledge Graph and its Usage for Retrieving Patient Cohorts from Electronic Medical Records Travis Goodwin & Sanda Harabagiu Human Language Technology Research
More informationStructured Learning and Prediction in Computer Vision. Contents
Foundations and Trends R in Computer Graphics and Vision Vol. 6, Nos. 3 4 (2010) 185 365 c 2011 S. Nowozin and C. H. Lampert DOI: 10.1561/0600000033 Structured Learning and Prediction in Computer Vision
More informationHIGH PERFORMANCE BIG DATA ANALYTICS
HIGH PERFORMANCE BIG DATA ANALYTICS Kunle Olukotun Electrical Engineering and Computer Science Stanford University June 2, 2014 Explosion of Data Sources Sensors DoD is swimming in sensors and drowning
More informationBig Data Science. Prof. Lise Getoor University of Maryland, College Park. http://www.cs.umd.edu/~getoor. October 17, 2013
Big Data Science Prof Lise Getoor University of Maryland, College Park October 17, 2013 http://wwwcsumdedu/~getoor BIG Data is not flat 20042013 lonnitaylor Data is multimodal, multirelational, spatiotemporal,
More informationQuerying Past and Future in Web Applications
Querying Past and Future in Web Applications Daniel Deutch TelAviv University Tova Milo Customer HR System Logistics ERP Bank ecomm CRM Supplier Outline Introduction & Motivation Querying Future [VLDB
More informationSanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a
More informationCaseFactor Diagrams for Structured Probabilistic Modeling
CaseFactor Diagrams for Structured Probabilistic Modeling David McAllester TTI at Chicago mcallester@ttic.org Michael Collins CSAIL Massachusetts Institute of Technology mcollins@ai.mit.edu Fernando
More informationBig Data from a Database Theory Perspective
Big Data from a Database Theory Perspective Martin Grohe Lehrstuhl Informatik 7  Logic and the Theory of Discrete Systems A CS View on Data Science Applications Data System Users 2 Us Data HUGE heterogeneous
More informationDistance Degree Sequences for Network Analysis
Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation
More informationChapter 28. Bayesian Networks
Chapter 28. Bayesian Networks The Quest for Artificial Intelligence, Nilsson, N. J., 2009. Lecture Notes on Artificial Intelligence, Spring 2012 Summarized by Kim, ByoungHee and Lim, ByoungKwon Biointelligence
More informationVarious applications of restricted Boltzmann machines for bad quality training data
Wrocław University of Technology Various applications of restricted Boltzmann machines for bad quality training data Maciej Zięba Wroclaw University of Technology 20.06.2014 Motivation Big data  7 dimensions1
More information!!!!!!!! The Internet of (Connected) Things. by Grace Andrews and Huston Hedinger
The Internet of (Connected) Things by Grace Andrews and Huston Hedinger Introduction The Internet of Things presents an incredible number of new opportunities for growth in the coming years. From infrastructures
More informationBayesian networks  Timeseries models  Apache Spark & Scala
Bayesian networks  Timeseries models  Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup  November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
More informationA Sublinear Bipartiteness Tester for Bounded Degree Graphs
A Sublinear Bipartiteness Tester for Bounded Degree Graphs Oded Goldreich Dana Ron February 5, 1998 Abstract We present a sublineartime algorithm for testing whether a bounded degree graph is bipartite
More informationA Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athenainnovation.gr Abstract.
More informationLife of A Knowledge Base (KB)
Life of A Knowledge Base (KB) A knowledge base system is a special kind of database management system to for knowledge base management. KB extraction: knowledge extraction using statistical models in NLP/ML
More informationEnhanced Information Access to Social Streams. Enhanced Word Clouds with Entity Grouping
Enhanced Information Access to Social Streams through Word Clouds with Entity Grouping Martin Leginus 1, Leon Derczynski 2 and Peter Dolog 1 1 Department of Computer Science, Aalborg University Selma Lagerlofs
More informationKEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS
ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,
More informationMultiRelational Record Linkage
MultiRelational Record Linkage Parag and Pedro Domingos Department of Computer Science and Engineering University of Washington Seattle, WA 98195, U.S.A. {parag,pedrod}@cs.washington.edu http://www.cs.washington.edu/homes/{parag,pedrod}
More informationHow Conditional Random Fields Learn Dynamics: An ExampleBased Study
Computer Communication & Collaboration (2013) Submitted on 27/May/2013 How Conditional Random Fields Learn Dynamics: An ExampleBased Study Mohammad Javad Shafiee School of Electrical & Computer Engineering,
More informationDynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC
Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC Outline Dynamic Load Balancing framework in Charm++ Measurement Based Load Balancing Examples: Hybrid Load Balancers Topologyaware
More informationEmail Alias Detection Using Social Network Analysis
Email Alias Detection Using Social Network Analysis Ralf Hölzer Information Networking Institute Carnegie Mellon University Pittsburgh, PA 1513 rholzer@cmu.edu Bradley Malin School of Computer Science
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationFastForward I/O and Storage: ACG 6.6 Demonstration
FastForward I/O and Storage: ACG 6.6 Demonstration NOTICE: THIS MANUSCRIPT HAS BEEN AUTHORED BY INTEL UNDER ITS SUBCONTRACT WITH LAWRENCE LIVERMORE NATIONAL SECURITY, LLC WHO IS THE OPERATOR AND MANAGER
More informationModeling Human Behavior at a Large Scale
Modeling Human Behavior at a Large Scale by Adam Sadilek Submitted in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Supervised by Henry A. Kautz Department of Computer Science
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More information1 An Introduction to Conditional Random Fields for Relational Learning
1 An Introduction to Conditional Random Fields for Relational Learning Charles Sutton Department of Computer Science University of Massachusetts, USA casutton@cs.umass.edu http://www.cs.umass.edu/ casutton
More informationChapter 2 Introduction to Sequence Analysis for Human Behavior Understanding
Chapter 2 Introduction to Sequence Analysis for Human Behavior Understanding Hugues Salamin and Alessandro Vinciarelli 2.1 Introduction Human sciences recognize sequence analysis as a key aspect of any
More informationTraining Conditional Random Fields using Virtual Evidence Boosting
Training Conditional Random Fields using Virtual Evidence Boosting Lin Liao Tanzeem Choudhury Dieter Fox Henry Kautz University of Washington Intel Research Department of Computer Science & Engineering
More informationA Serial Partitioning Approach to Scaling GraphBased Knowledge Discovery
A Serial Partitioning Approach to Scaling GraphBased Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington
More informationAsking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate  R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
More informationA Learning Based Method for SuperResolution of Low Resolution Images
A Learning Based Method for SuperResolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method
More informationAn Interactive Visualization Tool for Nipype Medical Image Computing Pipelines
An Interactive Visualization Tool for Nipype Medical Image Computing Pipelines Ramesh Sridharan, Adrian V. Dalca, and Polina Golland Computer Science and Artificial Intelligence Lab, MIT Abstract. We present
More informationLecture 2: Complexity Theory Review and Interactive Proofs
600.641 Special Topics in Theoretical Cryptography January 23, 2007 Lecture 2: Complexity Theory Review and Interactive Proofs Instructor: Susan Hohenberger Scribe: Karyn Benson 1 Introduction to Cryptography
More informationAutomated Model Based Testing for an Web Applications
Automated Model Based Testing for an Web Applications Agasarpa Mounica, Lokanadham Naidu Vadlamudi Abstract As the development of web applications plays a major role in our daytoday life. Modeling the
More informationMachine Learning over Big Data
Machine Learning over Big Presented by Fuhao Zou fuhao@hust.edu.cn Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed
More informationMapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
More informationJournal of Machine Learning Research 1 (2013) 11 Submitted 8/13; Published 10/13
Journal of Machine Learning Research 1 (2013) 11 Submitted 8/13; Published 10/13 PyStruct  Learning Structured Prediction in Python Andreas C. Müller Sven Behnke Institute of Computer Science, Department
More informationBayesian Networks Chapter 14. Mausam (Slides by UWAI faculty & David Page)
Bayesian Networks Chapter 14 Mausam (Slides by UWAI faculty & David Page) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation
More information3. The Junction Tree Algorithms
A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )
More informationComplex Network Visualization based on Voronoi Diagram and Smoothedparticle Hydrodynamics
Complex Network Visualization based on Voronoi Diagram and Smoothedparticle Hydrodynamics Zhao Wenbin 1, Zhao Zhengxu 2 1 School of Instrument Science and Engineering, Southeast University, Nanjing, Jiangsu
More informationFinding the M Most Probable Configurations Using Loopy Belief Propagation
Finding the M Most Probable Configurations Using Loopy Belief Propagation Chen Yanover and Yair Weiss School of Computer Science and Engineering The Hebrew University of Jerusalem 91904 Jerusalem, Israel
More informationBayesian Network Development
Bayesian Network Development Kenneth BACLAWSKI College of Computer Science, Northeastern University Boston, Massachusetts 02115 USA Ken@Baclawski.com Abstract Bayesian networks are a popular mechanism
More informationSearch engines: ranking algorithms
Search engines: ranking algorithms Gianna M. Del Corso Dipartimento di Informatica, Università di Pisa, Italy ESP, 25 Marzo 2015 1 Statistics 2 Search Engines Ranking Algorithms HITS Web Analytics Estimated
More informationViral Marketing and the Diffusion of Trends on Social Networks
Viral Marketing and the Diffusion of Trends on Social Networks Jennifer Wortman Technical Report MSCIS0819 Department of Computer and Information Science University of Pennsylvania May 15, 2008 Abstract
More informationGraphical Modeling for Genomic Data
Graphical Modeling for Genomic Data Carel F.W. Peeters cf.peeters@vumc.nl Joint work with: Wessel N. van Wieringen Mark A. van de Wiel Molecular Biostatistics Unit Dept. of Epidemiology & Biostatistics
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationNetwork Analysis and Visualization of Staphylococcus aureus. by Russ Gibson
Network Analysis and Visualization of Staphylococcus aureus by Russ Gibson Network analysis Based on graph theory Probabilistic models (random graphs) developed by Erdős and Rényi in 1959 Theory and tools
More informationEfficient Identification of Starters and Followers in Social Media
Efficient Identification of Starters and Followers in Social Media Michael Mathioudakis Department of Computer Science University of Toronto mathiou@cs.toronto.edu Nick Koudas Department of Computer Science
More informationBayesian Networks. Mausam (Slides by UWAI faculty)
Bayesian Networks Mausam (Slides by UWAI faculty) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation & inference BNs provide
More informationNEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES
NEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES Silvija Vlah Kristina Soric Visnja Vojvodic Rosenzweig Department of Mathematics
More informationLearning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of
More information7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION REPORT NUMBER
REPORT DOCUMENTATION PAGE Form Approved OMB No. 07040188 The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,
More informationHidden Markov Models Chapter 15
Hidden Markov Models Chapter 15 Mausam (Slides based on Dan Klein, Luke Zettlemoyer, Alex Simma, Erik Sudderth, David FernandezBaca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan
More informationClassification of Laser and Visual Sensors Using Associative Markov Networks
Classification of Laser and Visual Sensors Using Associative Markov Networks José Angelo Gurzoni Jr, Fabiano R. Correa, Fabio Gagliardi Cozman 1 Escola Politécnica da Universidade de São Paulo São Paulo,
More informationA survey on click modeling in web search
A survey on click modeling in web search Lianghao Li Hong Kong University of Science and Technology Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models
More informationTowards Social Data Platform: Automatic Topic focused Monitor for Twitter Stream
Towards Social Data Platform: Automatic Topic focused Monitor for Twitter Stream Rui Li ruili1@illinois.edu Shengjie Wang wang260@illinois.edu Kevin Chen Chuan Chang, kcchang@illinois.edu Department of
More informationBerkeley CS191x: Quantum Mechanics and Quantum Computation Optional Class Project
Berkeley CS191x: Quantum Mechanics and Quantum Computation Optional Class Project This document describes the optional class project for the Fall 2013 offering of CS191x. The project will not be graded.
More informationDynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
Journal of Machine Learning Research 9 (2008) 15831614 Submitted 9/07; Revised 1/08; Published 7/08 Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction Jun Zhu Department of Computer
More informationLifted FirstOrder Belief Propagation
Lifted FirstOrder Belief Propagation Parag Singla Pedro Domingos Department of Computer Science and Engineering University of Washington Seattle, WA 981952350, U.S.A. {parag, pedrod}@cs.washington.edu
More informationDNS and Honeypot Data. NTT Information Sharing Platform Labs. Keisuke ISHIBASHI, Tsuyoshi TOYONO, and Makoto IWAMURA
Botnet Detection Combining DNS and Honeypot Data NTT Information Sharing Platform Labs. Keisuke ISHIBASHI, Tsuyoshi TOYONO, and Makoto IWAMURA 1 Outline Motivation Problem of Black domain list Graph kernels
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP  Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More informationWhy NoSQL? Your database options in the new non relational world. 2015 IBM Cloudant 1
Why NoSQL? Your database options in the new non relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots
More informationBig Data & Scripting Part II Streaming Algorithms
Big Data & Scripting Part II Streaming Algorithms 1, Counting Distinct Elements 2, 3, counting distinct elements problem formalization input: stream of elements o from some universe U e.g. ids from a set
More informationSentiment Analysis on Big Data
SPAN White Paper!? Sentiment Analysis on Big Data Machine Learning Approach Several sources on the web provide deep insight about people s opinions on the products and services of various companies. Social
More informationVisCG: Creating an Eclipse Call Graph Visualization Plugin. Kenta Hasui, Undergraduate Student at Vassar College Class of 2015
VisCG: Creating an Eclipse Call Graph Visualization Plugin Kenta Hasui, Undergraduate Student at Vassar College Class of 2015 Abstract Call graphs are a useful tool for understanding software; however,
More informationMedial Axis Construction and Applications in 3D Wireless Sensor Networks
Medial Axis Construction and Applications in 3D Wireless Sensor Networks Su Xia, Ning Ding, Miao Jin, Hongyi Wu, and Yang Yang Presenter: Hongyi Wu University of Louisiana at Lafayette Outline Introduction
More informationA Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
More informationHow does the Kinect work? John MacCormick
How does the Kinect work? John MacCormick Xbox demo Laptop demo The Kinect uses structured light and machine learning Inferring body position is a twostage process: first compute a depth map (using structured
More informationCost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:
CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm
More informationOptimizations. Optimization Safety. Optimization Safety. Control Flow Graphs. Code transformations to improve program
Optimizations Code transformations to improve program Mainly: improve execution time Also: reduce program size Control low Graphs Can be done at high level or low level E.g., constant folding Optimizations
More informationChange Impact Analysis
Change Impact Analysis Martin Ward Reader in Software Engineering martin@gkc.org.uk Software Technology Research Lab De Montfort University Change Impact Analysis Impact analysis is a process that predicts
More informationDynamical Clustering of Personalized Web Search Results
Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC xshen@cs.uiuc.edu Hong Cheng CS Dept, UIUC hcheng3@uiuc.edu Abstract Most current search engines present the user a ranked
More informationPractical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
More informationCompact Representations and Approximations for Compuation in Games
Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions
More informationDetecting Anomalies in Network Traffic Using Maximum Entropy Estimation
Detecting Anomalies in Network Traffic Using Maximum Entropy Estimation Yu Gu, Andrew McCallum, Don Towsley Department of Computer Science, University of Massachusetts, Amherst, MA 01003 Abstract We develop
More informationMapReduce Approach to Collective Classification for Networks
MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty
More informationDoptimal plans in observational studies
Doptimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
More informationCommon Patterns and Pitfalls for Implementing Algorithms in Spark. Hossein Falaki @mhfalaki hossein@databricks.com
Common Patterns and Pitfalls for Implementing Algorithms in Spark Hossein Falaki @mhfalaki hossein@databricks.com Challenges of numerical computation over big data When applying any algorithm to big data
More informationApproximating the Partition Function by Deleting and then Correcting for Model Edges
Approximating the Partition Function by Deleting and then Correcting for Model Edges Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles Los Angeles, CA 995
More informationAssignment 5: Visualization
Assignment 5: Visualization Arash Vahdat March 17, 2015 Readings Depending on how familiar you are with web programming, you are recommended to study concepts related to CSS, HTML, and JavaScript. The
More informationData Mining in Web Search Engine Optimization and User Assisted Rank Results
Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management
More informationResearch Statement Immanuel Trummer www.itrummer.org
Research Statement Immanuel Trummer www.itrummer.org We are collecting data at unprecedented rates. This data contains valuable insights, but we need complex analytics to extract them. My research focuses
More informationBig Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI580, Bo Wu Graphs
More informationTraffic Driven Analysis of Cellular Data Networks
Traffic Driven Analysis of Cellular Data Networks Samir R. Das Computer Science Department Stony Brook University Joint work with Utpal Paul, Luis Ortiz (Stony Brook U), Milind Buddhikot, Anand Prabhu
More informationStructured Models for FinetoCoarse Sentiment Analysis
Structured Models for FinetoCoarse Sentiment Analysis Ryan McDonald Kerry Hannan Tyler Neylon Mike Wells Jeff Reynar Google, Inc. 76 Ninth Avenue New York, NY 10011 Contact email: ryanmcd@google.com
More informationSome Research Challenges for Big Data Analytics of Intelligent Security
Some Research Challenges for Big Data Analytics of Intelligent Security YuhJong Hu hu at cs.nccu.edu.tw Emerging Network Technology (ENT) Lab. Department of Computer Science National Chengchi University,
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationSense Making in an IOT World: Sensor Data Analysis with Deep Learning
Sense Making in an IOT World: Sensor Data Analysis with Deep Learning Natalia Vassilieva, PhD Senior Research Manager GTC 2016 Deep learning proof points as of today Vision Speech Text Other Search & information
More informationDesign of Data Entry Systems Pam Kellogg, Research Informatics, Family Health International
Design of Data Entry Systems Pam Kellogg,, Family Health International 1 Data Entry Design Topics CRFs Annotated CRFs Record Layouts Data Dictionaries Code lists Data entry screens Testing QA Documentation
More information