Programming Tools based on Big Data and Conditional Random Fields
|
|
|
- Brett Brooks
- 10 years ago
- Views:
Transcription
1 Programming Tools based on Big Data and Conditional Random Fields Veselin Raychev Martin Vechev Andreas Krause Department of Computer Science ETH Zurich Zurich Machine Learning and Data Science Meet-up, December 2014
2 Motivation Unprecedented access to massive codebases
3 Motivation ~16M repos ~ 7M users # of repositories year
4 Vision Statistical Programming Tools Probabilistically likely solutions to problems difficult or impossible to solve with traditional rule-based techniques
5 General Approach Find the right program representation for the task Find the right probabilistic model for the task Build a probabilistic model over the representation and existing code Use the probabilistic model to answer queries on new programs Programming languages + Machine learning
6 1,000+ Tweets (sample below):
7 JSNice: Popularity one of the top ranked tools for JavaScript in ,000 users in 1 st week of release used in 180 countries
8 JSNice Intuition: Image Denoising Original image Noisy Image Denoised Image
9 Image Denoising Noisy Image? Denoised Image
10 JSNice function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n;? function chunkdata(str, step) var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.substring(i, i + step)); colnames.push(str.substring(i, len)); return colnames;
11 Structured Prediction for Programs (V. Raychev, M. Vechev, A. Krause, ACM POPL 15, to appear) Bridges Program Analysis and Conditional Random Fields First connection between programs and CRFs JSNice is a special instance CRFs a key model in Computer Vision
12 Markov Random Fields Undirected graphical model Graph + factors define a joint probability distribution t i 1 r P i, r, t = 1 i, t 2 i, r Z(i, r, t) 2 Captures dependence between facts to be predicted Undirected models better suited for our than directed models (direction is hard to capture) More on graphical models in: Probabilistic Graphical Models for Image Analysis, ETH graduate course, McWilliams and Lucchi
13 Conditional Random Fields (McCallum et.al, 2001) Some facts are already known, denoted as x We would like to predict new facts, y, conditioned on the known facts x t i 1 K r P i, r t = 1 i, t 2 i, r Z(t) 2 Key advantage of CRFs over MRFs: no priors required.
14 MAP inference: joint prediction y best = argmax P y x y x Key: MAP inference over marginals! This is key for programs i 1 t K P i, r t = r 2 1 i, t 2 i, r Z(t) i, r best = argmax 1 i, t 2 i, r (i, r) x We use an iterative greedy algorithm
15 Learning CRFs from Data (via max-margin training, Ratliff et.al., 2007) A convenient representation for learning from data is a log-linear CRF P y x = 1 Z(x) exp (wt f(y, x)) learned from data As we require only CRFs and MAP inference, we use the max-margin training due to Ratliff et.al. (2007). Computes subgradient via MAP inference. Avoids computation of Z(x)!
16 Learning CRFs from Data (via max-margin training, Ratliff et.al., 2007) A convenient representation for learning from data is a log-linear CRF P y x = 1 Z(x) exp (wt f(y, x)) learned from data y best = argmax w T f(y, x) y x As we require only CRFs and MAP inference, we use the max-margin training due to Ratliff et.al. (2007). Computes subgradient via MAP inference. Avoids computation of Z(x)!
17 Recipe: From a Program to a CRF
18 Recipe: From a Program to a CRF Step 1: Define the elements and their properties of interest Elements become nodes in a network, node content ranges over properties Example: elements are variables, properties are their type
19 Recipe: From a Program to a CRF Step 1: Define the elements and their properties of interest Elements become nodes in a network, node content ranges over properties Example: elements are variables, properties are their type Step 2: Define feature functions between elements Feature functions become undirected edges in the network Example: aliasing between variables, shared function caller, etc.
20 Recipe: From a Program to a CRF Step 1: Define the elements and their properties of interest Elements become nodes in a network, node content ranges over properties Example: elements are variables, properties are their type Step 2: Define feature functions between elements Feature functions become undirected edges in the network Example: aliasing between variables, shared function caller, etc. Step 3: Build network via static program Automatically extract nodes and feature functions from the program Example: alias, call graph Key point: general problem undecidable, need good approximations! More on Program Analysis: Program Analysis, ETH graduate course, M. Vechev, Spring 2015
21 function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; MAP inference
22 MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: t r i length
23 MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: t r i length i t r length
24 MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: t r i length i t w i step 0.5 j step 0.4 i t r length i t r length i r w i len 0.6 j length 0.3 r length w length length 0.5 len length
25 MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: t r i length i t w i step 0.5 j step 0.4 i t r length i t r length i r w i len 0.6 j length 0.3 r length w length length 0.5 len length
26 MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: i t r t r i length length i t w i step 0.5 j step 0.4 i i i r w i len 0.6 j length 0.3 t step r len length r length w length length 0.5 len length
27 MAP inference function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; function chunkdata(str, step) var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.substring(i, i + step)); colnames.push(str.substring(i, len)); return colnames; Unknown properties: Known properties: i t r t r i length length i t w i step 0.5 j step 0.4 i i i r w i len 0.6 j length 0.3 t step r len length r length w length length 0.5 len length
28 Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs
29 Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs alias,call
30 Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs alias,call ~ 7M functions for names ~70K functions for type
31 Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs max-margin training alias,call ~ 7M functions for names ~70K functions for type
32 Structured Prediction for Programs var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform ~ 150MB var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs max-margin training alias,call ~ 7M functions for names ~70K functions for type
33 Structured Prediction for Programs ~ 30 nodes, ~400 edges var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform ~ 150MB var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs max-margin training alias,call ~ 7M functions for names ~70K functions for type
34 Structured Prediction for Programs ~ 30 nodes, ~400 edges Time: milliseconds var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase inference transform ~ 150MB var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Learning Phase Learned Weights and Feature Functions program learn weighs max-margin training alias,call ~ 7M functions for names ~70K functions for type
35 Structured Prediction for Programs ~ 30 nodes, ~400 edges var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.subsi, i + t)); n.push(e.subsi, r)); return n; program Prediction Phase Learning Phase program Time: milliseconds inference learn weighs transform ~ 150MB Learned Weights and Feature Functions max-margin training var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.subs(i, i + step)); colnames.push(str.substi, len)); return colnames; Names: 63% Types: 81% (helps typechecking) alias,call ~ 7M functions for names ~70K functions for type
36 Structured Prediction for Programs function chunkdata(e, t) var n = []; var r = e.length; for (; i < r; i += t) if (i + t < r) n.push(e.substring(i, i + t)); n.push(e.substring(i, r)); return n; Unknown properties: Known properties: i t r t r i length length function chunkdata(str, step) var colnames = []; var len = str.length; for (; i < len; i += step) if (i + step < len) colnames.push(str.substring(i, i + step)); Bridges Program Analysis and CRFs First application of CRFs to programs CRFs learned from data Fast and Precise colnames.push(str.substring(i, len)); return colnames; i t w i step 0.5 j step 0.4 i r w i len 0.6 j length 0.3 i i t step step r len length r length w length length 0.5 len length
Predicting Program Properties from Big Code
Predicting Program Properties from Big Code * POPL * Artifact Consistent * Complete * Well Documented * Easy to Reuse * Evaluated * AEC * Veselin Raychev Department of Computer Science ETH Zürich [email protected]
Distributed Structured Prediction for Big Data
Distributed Structured Prediction for Big Data A. G. Schwing ETH Zurich [email protected] T. Hazan TTI Chicago M. Pollefeys ETH Zurich R. Urtasun TTI Chicago Abstract The biggest limitations of learning
The Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
Course: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 [email protected] Abstract Probability distributions on structured representation.
Conditional Random Fields: An Introduction
Conditional Random Fields: An Introduction Hanna M. Wallach February 24, 2004 1 Labeling Sequential Data The task of assigning label sequences to a set of observation sequences arises in many fields, including
An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data
n Introduction to the Use of ayesian Network to nalyze Gene Expression Data Cristina Manfredotti Dipartimento di Informatica, Sistemistica e Comunicazione (D.I.S.Co. Università degli Studi Milano-icocca
Introduction to Deep Learning Variational Inference, Mean Field Theory
Introduction to Deep Learning Variational Inference, Mean Field Theory 1 Iasonas Kokkinos [email protected] Center for Visual Computing Ecole Centrale Paris Galen Group INRIA-Saclay Lecture 3: recap
Travis Goodwin & Sanda Harabagiu
Automatic Generation of a Qualified Medical Knowledge Graph and its Usage for Retrieving Patient Cohorts from Electronic Medical Records Travis Goodwin & Sanda Harabagiu Human Language Technology Research
Structured Learning and Prediction in Computer Vision. Contents
Foundations and Trends R in Computer Graphics and Vision Vol. 6, Nos. 3 4 (2010) 185 365 c 2011 S. Nowozin and C. H. Lampert DOI: 10.1561/0600000033 Structured Learning and Prediction in Computer Vision
HIGH PERFORMANCE BIG DATA ANALYTICS
HIGH PERFORMANCE BIG DATA ANALYTICS Kunle Olukotun Electrical Engineering and Computer Science Stanford University June 2, 2014 Explosion of Data Sources Sensors DoD is swimming in sensors and drowning
Big Data Science. Prof. Lise Getoor University of Maryland, College Park. http://www.cs.umd.edu/~getoor. October 17, 2013
Big Data Science Prof Lise Getoor University of Maryland, College Park October 17, 2013 http://wwwcsumdedu/~getoor BIG Data is not flat 2004-2013 lonnitaylor Data is multi-modal, multi-relational, spatio-temporal,
Sanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 [email protected] 1. Introduction The field of data mining and knowledgee discovery is emerging as a
Distance Degree Sequences for Network Analysis
Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation
Chapter 28. Bayesian Networks
Chapter 28. Bayesian Networks The Quest for Artificial Intelligence, Nilsson, N. J., 2009. Lecture Notes on Artificial Intelligence, Spring 2012 Summarized by Kim, Byoung-Hee and Lim, Byoung-Kwon Biointelligence
Various applications of restricted Boltzmann machines for bad quality training data
Wrocław University of Technology Various applications of restricted Boltzmann machines for bad quality training data Maciej Zięba Wroclaw University of Technology 20.06.2014 Motivation Big data - 7 dimensions1
Bayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.
Life of A Knowledge Base (KB)
Life of A Knowledge Base (KB) A knowledge base system is a special kind of database management system to for knowledge base management. KB extraction: knowledge extraction using statistical models in NLP/ML
Enhanced Information Access to Social Streams. Enhanced Word Clouds with Entity Grouping
Enhanced Information Access to Social Streams through Word Clouds with Entity Grouping Martin Leginus 1, Leon Derczynski 2 and Peter Dolog 1 1 Department of Computer Science, Aalborg University Selma Lagerlofs
KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS
ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,
How Conditional Random Fields Learn Dynamics: An Example-Based Study
Computer Communication & Collaboration (2013) Submitted on 27/May/2013 How Conditional Random Fields Learn Dynamics: An Example-Based Study Mohammad Javad Shafiee School of Electrical & Computer Engineering,
Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC
Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC Outline Dynamic Load Balancing framework in Charm++ Measurement Based Load Balancing Examples: Hybrid Load Balancers Topology-aware
How To Find An Alias On Email From A Computer (For A Free Download)
Email Alias Detection Using Social Network Analysis Ralf Hölzer Information Networking Institute Carnegie Mellon University Pittsburgh, PA 1513 [email protected] Bradley Malin School of Computer Science
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
Data Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
1 An Introduction to Conditional Random Fields for Relational Learning
1 An Introduction to Conditional Random Fields for Relational Learning Charles Sutton Department of Computer Science University of Massachusetts, USA [email protected] http://www.cs.umass.edu/ casutton
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington
Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
A Learning Based Method for Super-Resolution of Low Resolution Images
A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 [email protected] Abstract The main objective of this project is the study of a learning based method
An Interactive Visualization Tool for Nipype Medical Image Computing Pipelines
An Interactive Visualization Tool for Nipype Medical Image Computing Pipelines Ramesh Sridharan, Adrian V. Dalca, and Polina Golland Computer Science and Artificial Intelligence Lab, MIT Abstract. We present
Automated Model Based Testing for an Web Applications
Automated Model Based Testing for an Web Applications Agasarpa Mounica, Lokanadham Naidu Vadlamudi Abstract- As the development of web applications plays a major role in our day-to-day life. Modeling the
Lecture 2: Complexity Theory Review and Interactive Proofs
600.641 Special Topics in Theoretical Cryptography January 23, 2007 Lecture 2: Complexity Theory Review and Interactive Proofs Instructor: Susan Hohenberger Scribe: Karyn Benson 1 Introduction to Cryptography
Machine Learning over Big Data
Machine Learning over Big Presented by Fuhao Zou [email protected] Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed
Journal of Machine Learning Research 1 (2013) 1-1 Submitted 8/13; Published 10/13
Journal of Machine Learning Research 1 (2013) 1-1 Submitted 8/13; Published 10/13 PyStruct - Learning Structured Prediction in Python Andreas C. Müller Sven Behnke Institute of Computer Science, Department
MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
3. The Junction Tree Algorithms
A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin [email protected] 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )
Finding the M Most Probable Configurations Using Loopy Belief Propagation
Finding the M Most Probable Configurations Using Loopy Belief Propagation Chen Yanover and Yair Weiss School of Computer Science and Engineering The Hebrew University of Jerusalem 91904 Jerusalem, Israel
Search engines: ranking algorithms
Search engines: ranking algorithms Gianna M. Del Corso Dipartimento di Informatica, Università di Pisa, Italy ESP, 25 Marzo 2015 1 Statistics 2 Search Engines Ranking Algorithms HITS Web Analytics Estimated
Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics
Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Zhao Wenbin 1, Zhao Zhengxu 2 1 School of Instrument Science and Engineering, Southeast University, Nanjing, Jiangsu
Bayesian Network Development
Bayesian Network Development Kenneth BACLAWSKI College of Computer Science, Northeastern University Boston, Massachusetts 02115 USA [email protected] Abstract Bayesian networks are a popular mechanism
Efficient Identification of Starters and Followers in Social Media
Efficient Identification of Starters and Followers in Social Media Michael Mathioudakis Department of Computer Science University of Toronto [email protected] Nick Koudas Department of Computer Science
Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
Graphical Modeling for Genomic Data
Graphical Modeling for Genomic Data Carel F.W. Peeters [email protected] Joint work with: Wessel N. van Wieringen Mark A. van de Wiel Molecular Biostatistics Unit Dept. of Epidemiology & Biostatistics
Bayesian Networks. Mausam (Slides by UW-AI faculty)
Bayesian Networks Mausam (Slides by UW-AI faculty) Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation & inference BNs provide
NEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES
NEW VERSION OF DECISION SUPPORT SYSTEM FOR EVALUATING TAKEOVER BIDS IN PRIVATIZATION OF THE PUBLIC ENTERPRISES AND SERVICES Silvija Vlah Kristina Soric Visnja Vojvodic Rosenzweig Department of Mathematics
Multiple Network Marketing coordination Model
REPORT DOCUMENTATION PAGE Form Approved OMB No. 0704-0188 The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,
Learning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of
Topic models for Sentiment analysis: A Literature Survey
Topic models for Sentiment analysis: A Literature Survey Nikhilkumar Jadhav 123050033 June 26, 2014 In this report, we present the work done so far in the field of sentiment analysis using topic models.
A survey on click modeling in web search
A survey on click modeling in web search Lianghao Li Hong Kong University of Science and Technology Outline 1 An overview of web search marketing 2 An overview of click modeling 3 A survey on click models
Hidden Markov Models Chapter 15
Hidden Markov Models Chapter 15 Mausam (Slides based on Dan Klein, Luke Zettlemoyer, Alex Simma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan
How To Classify Objects From 3D Data On A Robot
Classification of Laser and Visual Sensors Using Associative Markov Networks José Angelo Gurzoni Jr, Fabiano R. Correa, Fabio Gagliardi Cozman 1 Escola Politécnica da Universidade de São Paulo São Paulo,
Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1
Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots
Berkeley CS191x: Quantum Mechanics and Quantum Computation Optional Class Project
Berkeley CS191x: Quantum Mechanics and Quantum Computation Optional Class Project This document describes the optional class project for the Fall 2013 offering of CS191x. The project will not be graded.
Sentiment Analysis on Big Data
SPAN White Paper!? Sentiment Analysis on Big Data Machine Learning Approach Several sources on the web provide deep insight about people s opinions on the products and services of various companies. Social
Statistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
Big Data & Scripting Part II Streaming Algorithms
Big Data & Scripting Part II Streaming Algorithms 1, Counting Distinct Elements 2, 3, counting distinct elements problem formalization input: stream of elements o from some universe U e.g. ids from a set
VisCG: Creating an Eclipse Call Graph Visualization Plug-in. Kenta Hasui, Undergraduate Student at Vassar College Class of 2015
VisCG: Creating an Eclipse Call Graph Visualization Plug-in Kenta Hasui, Undergraduate Student at Vassar College Class of 2015 Abstract Call graphs are a useful tool for understanding software; however,
A Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
Medial Axis Construction and Applications in 3D Wireless Sensor Networks
Medial Axis Construction and Applications in 3D Wireless Sensor Networks Su Xia, Ning Ding, Miao Jin, Hongyi Wu, and Yang Yang Presenter: Hongyi Wu University of Louisiana at Lafayette Outline Introduction
Practical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:
CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm
Optimizations. Optimization Safety. Optimization Safety. Control Flow Graphs. Code transformations to improve program
Optimizations Code transformations to improve program Mainly: improve execution time Also: reduce program size Control low Graphs Can be done at high level or low level E.g., constant folding Optimizations
Dynamical Clustering of Personalized Web Search Results
Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC [email protected] Hong Cheng CS Dept, UIUC [email protected] Abstract Most current search engines present the user a ranked
How does the Kinect work? John MacCormick
How does the Kinect work? John MacCormick Xbox demo Laptop demo The Kinect uses structured light and machine learning Inferring body position is a two-stage process: first compute a depth map (using structured
Change Impact Analysis
Change Impact Analysis Martin Ward Reader in Software Engineering [email protected] Software Technology Research Lab De Montfort University Change Impact Analysis Impact analysis is a process that predicts
Detecting Anomalies in Network Traffic Using Maximum Entropy Estimation
Detecting Anomalies in Network Traffic Using Maximum Entropy Estimation Yu Gu, Andrew McCallum, Don Towsley Department of Computer Science, University of Massachusetts, Amherst, MA 01003 Abstract We develop
Compact Representations and Approximations for Compuation in Games
Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions
MapReduce Approach to Collective Classification for Networks
MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty
Common Patterns and Pitfalls for Implementing Algorithms in Spark. Hossein Falaki @mhfalaki [email protected]
Common Patterns and Pitfalls for Implementing Algorithms in Spark Hossein Falaki @mhfalaki [email protected] Challenges of numerical computation over big data When applying any algorithm to big data
Approximating the Partition Function by Deleting and then Correcting for Model Edges
Approximating the Partition Function by Deleting and then Correcting for Model Edges Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles Los Angeles, CA 995
Data Mining in Web Search Engine Optimization and User Assisted Rank Results
Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management
D-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
Assignment 5: Visualization
Assignment 5: Visualization Arash Vahdat March 17, 2015 Readings Depending on how familiar you are with web programming, you are recommended to study concepts related to CSS, HTML, and JavaScript. The
Research Statement Immanuel Trummer www.itrummer.org
Research Statement Immanuel Trummer www.itrummer.org We are collecting data at unprecedented rates. This data contains valuable insights, but we need complex analytics to extract them. My research focuses
Big Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs
Traffic Driven Analysis of Cellular Data Networks
Traffic Driven Analysis of Cellular Data Networks Samir R. Das Computer Science Department Stony Brook University Joint work with Utpal Paul, Luis Ortiz (Stony Brook U), Milind Buddhikot, Anand Prabhu
Sense Making in an IOT World: Sensor Data Analysis with Deep Learning
Sense Making in an IOT World: Sensor Data Analysis with Deep Learning Natalia Vassilieva, PhD Senior Research Manager GTC 2016 Deep learning proof points as of today Vision Speech Text Other Search & information
Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University [email protected]
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University [email protected] 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
Clustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
Reputation Network Analysis for Email Filtering
Reputation Network Analysis for Email Filtering Jennifer Golbeck, James Hendler University of Maryland, College Park MINDSWAP 8400 Baltimore Avenue College Park, MD 20742 {golbeck, hendler}@cs.umd.edu
Cell Phone based Activity Detection using Markov Logic Network
Cell Phone based Activity Detection using Markov Logic Network Somdeb Sarkhel [email protected] 1 Introduction Mobile devices are becoming increasingly sophisticated and the latest generation of smart
