Automatically Tracking Events in the News. Kathleen McKeown Department of Computer Science Columbia University

Size: px
Start display at page:

Download "Automatically Tracking Events in the News. Kathleen McKeown Department of Computer Science Columbia University"

Transcription

1 1 Automatically Tracking Events in the News Kathleen McKeown Department of Computer Science Columbia University

2 2 Vision Generating presentations that connect Events Opinions Personal accounts Their impact on the world

3 3 Two Tasks Monitoring events over time Predicting their impact on financial markets Joint work with Tony Jebara and David Yao

4 4 Machine learning framework Data (often labeled) Extraction of features from text data Prediction of output

5 5 Machine learning framework Data (often labeled) Extraction of features from text data Prediction of output What data is available for learning?

6 6 Machine learning framework Data (often labeled) Extraction of features from text data Prediction of output What features yield good predictions?

7 7 Two Tasks Monitoring events over time Predicting their impact on financial markets Joint work with Tony Jebara and David Yao

8 8 Monitor events over time Input: streaming data News, social media, web pages At every hour, what s new

9 9

10 10

11 Text Compression 11

12 12 Data NIST evaluation on temporal summarization hourly web crawl October February TB Different categories of disaster Climate, man-made, social unrest

13 13

14 14

15 15

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 Temporal Summarization Approach At time t: 1. Predict salience for input sentences Disaster-specific features for predicting salience 2. Remove redundant sentences 3. Cluster and select exemplar sentences for t Incorporate salience prediction as a prior Kedzie & al, Bloomberg Social Good Workshop, KDD 2014 Kedzie & al, ACL

27 27

28 28 Predicting Salience: Model Features Basic sentence level features sentence length punctuation count number of capitalized words number of event type synonyms, hypernyms, and hyponyms

29 29 Predicting Salience: Model Features Basic sentence level features sentence length punctuation count number of capitalized words number of event type synonyms, hypernyms, and hyponyms Why is the number of capitalized words important?

30 Predicting Salience: Model Features Basic sentence level features sentence length punctuation count number of capitalized words number of event type synonyms, hypernyms, and hyponyms High Salience Nicaragua's disaster management said it had issued a local tsunami alert. Medium Salience People streamed out of homes, schools and oce buildings as far north as Mexico City. Low Salience Add to Digg Add to del.icio.us Add to Facebook Add to 30

31 Predicting Salience: Model Features Basic sentence level features sentence length punctuation count number of capitalized words number of event type synonyms, hypernyms, and hyponyms High Salience Nicaragua's disaster management said it had issued a local tsunami alert. Medium Salience People streamed out of homes, schools and oce buildings as far north as Mexico City. Low Salience Add to Digg Add to del.icio.us Add to Facebook Add to 31

32 32 Predicting Salience: Model Features Basic sentence level features sentence length punctuation count number of capitalized words number of event type synonyms, hypernyms, and hyponyms Why are synonyms, hypernyms and hyponyms important?

33 Predicting Salience: Model Features Basic sentence level features sentence length punctuation count number of capitalized words number of event type synonyms, hypernyms, and hyponyms High Salience Nicaragua's disaster management said it had issued a local tsunami alert. Medium Salience People streamed out of homes, schools and oce buildings as far north as Mexico City. Low Salience Add to Digg Add to del.icio.us Add to Facebook Add to 33

34 34 Predicting Salience: Model Features Basic sentence level features Language Models (5-gram Kneser-Ney model) generic news corpus (10 years AP and NY Times articles) domain specific corpus (disaster related Wikipedia articles) What does a generic language model capture? What does a domain specific language model capture?

35 Predicting Salience: Model Features Basic sentence level features Language Models (5-gram Kneser-Ney model) generic news corpus (10 years AP and NY Times articles) domain specific corpus (disaster related Wikipedia articles) High Salience Nicaragua's disaster management said it had issued a local tsunami alert. Medium Salience People streamed out of homes, schools and oce buildings as far north as Mexico City. Low Salience Add to Digg Add to del.icio.us Add to Facebook Add to 35

36 Predicting Salience: Model Features Basic sentence level features Language Models (5-gram Kneser-Ney model) generic news corpus (10 years AP and NY Times articles) domain specific corpus (disaster related Wikipedia articles) High Salience Nicaragua's disaster management said it had issued a local tsunami alert. Medium Salience People streamed out of homes, schools and oce buildings as far north as Mexico City. Low Salience Add to Digg Add to del.icio.us Add to Facebook Add to 36

37 Predicting Salience: Model Features Basic sentence level features Language Models (5-gram Kneser-Ney model) generic news corpus (10 years AP and NY Times articles) domain specific corpus (disaster related Wikipedia articles) High Salience Nicaragua's disaster management said it had issued a local tsunami alert. Medium Salience People streamed out of homes, schools and oce buildings as far north as Mexico City. Low Salience Add to Digg Add to del.icio.us Add to Facebook Add to 37

38 Predicting Salience: Model Features Basic sentence level features Language Models (5-gram Kneser-Ney model) Geographic Features tag input with Named-Entity tagger get coordinates for locations and mean distance to event High Salience Nicaragua's disaster management said it had issued a local tsunami alert. Medium Salience People streamed out of homes, schools and oce buildings as far north as Mexico City. Low Salience Add to Digg Add to del.icio.us Add to Facebook Add to 38

39 Predicting Salience: Model Features Basic sentence level features Language Models (5-gram Kneser-Ney model) Geographic Features tag input with Named-Entity tagger get coordinates for locations and mean distance to event High Salience Nicaragua's disaster management said it had issued a local tsunami alert. Medium Salience People streamed out of homes, schools and oce buildings as far north as Mexico City. Low Salience Add to Digg Add to del.icio.us Add to Facebook Add to 39

40 40 Predicting Salience: Model Features Basic sentence level features Language Models (5-gram Kneser-Ney model) Geographic Features Temporal Features measuring burstiness of words How might we measure the burstiness of words?

41 41 Determining Redundancy Use a semantic similarity metric Discard sentences with similarity to previous sentences

42 42

43 43

44 44

45 45

46 46

47 SubEvent Identification Decompose articles on a main event into related sub-events: Hurricane Sandy Manhattan Blackout Breezy Point fire Public Transit 47

48 48

49 49 Two Tasks Monitoring events over time Predicting their impact on financial markets Joint work with Tony Jebara and David Yao

50 50 COLUMBIA DATA SCIENCE INSTITUTE Can we predict the effect that a particular event such as extreme weather or political activity -- would have on financial markets? Take market financial data and traditional news feeds. Machine Learning Data Science Natural Language Processing Use Natural Language Processing to transform raw data into structured event streams. Financial and Economic Indices Apply Machine Learning tools to uncover previously hidden relationships between events and market behavior.

51 Financial Data Eventt Streams Bayesian Network Structure Discovery News Feeds Inference 51 Predicting Market Impact

52 52 NLP Event Features Binary features calculated from news Event type: 11 possible categories Event location: US or World Sampled daily and hourly

53 53 Financials, Macroeconomics Index/indicators selection Financial market indices (65) Stock market (15 broad + 27 sector) Bond market (8) Volatility index (6) Commodities (9) Macroeconomic indicators (8) real GDP (2) CPI (1) PPI (1) Income (1) Consumption (1) Employment/Unemployment situation (2)

54 54 Transform and Binarize Compute relative change in indices

55 Equal frequency discretization (S&P 500) 55

56 ML Structure Learning To learn structure we assume samples are drawn iid from unknown pairwise binary graphical model (no hidden variables) Training data: Newsblaster data and financial indicators Use method of Ravikumar, Wainwright, Lafferty [2010] Asymptotically optimal given mild assumptions & regularizer l 56

57 RVX Russell 2000 Volatility VXD Dow Volatility VXO S&P 100 Volatility 57 SCI TECH ECONOMIC DISASTER RVX VXD VXO

58 58 Results We can predict relation between event and market impact with significantly higher average test likelihood for held-out test days Naïve Tree D-Tree Ours Daily observations are orders of magnitude more likely under our model

59 59 Impact Learn from past events to predict the impact of new events on financial markets Identify new events as they occur in news feeds Graph modeling enables identification of structural relationships between detected events and financial events

60 60 Thank You! The research presented here has been supported in part by DARPA GRAPH, and NSF.

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

not possible or was possible at a high cost for collecting the data.

not possible or was possible at a high cost for collecting the data. Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

More information

MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS 2014. November 7, 2014. Machine Learning Group

MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS 2014. November 7, 2014. Machine Learning Group Big Data and Its Implication to Research Methodologies and Funding Cornelia Caragea TARDIS 2014 November 7, 2014 UNT Computer Science and Engineering Data Everywhere Lots of data is being collected and

More information

3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work

3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work Unsupervised Paraphrase Acquisition via Relation Discovery Takaaki Hasegawa Cyberspace Laboratories Nippon Telegraph and Telephone Corporation 1-1 Hikarinooka, Yokosuka, Kanagawa 239-0847, Japan hasegawa.takaaki@lab.ntt.co.jp

More information

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D. Data Mining on Social Networks Dionysios Sotiropoulos Ph.D. 1 Contents What are Social Media? Mathematical Representation of Social Networks Fundamental Data Mining Concepts Data Mining Tasks on Digital

More information

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) Machine Learning 1 Attribution Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) 2 Outline Inductive learning Decision

More information

Text Analytics for Competitive Analysis and Market Intelligence Aiaioo Labs - 2011

Text Analytics for Competitive Analysis and Market Intelligence Aiaioo Labs - 2011 Text Analytics for Competitive Analysis and Market Intelligence Aiaioo Labs - 2011 Bangalore, India Title Text Analytics Introduction Entity Person Comparative Analysis Entity or Event Text Analytics Text

More information

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content

More information

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining

More information

Text Mining - Scope and Applications

Text Mining - Scope and Applications Journal of Computer Science and Applications. ISSN 2231-1270 Volume 5, Number 2 (2013), pp. 51-55 International Research Publication House http://www.irphouse.com Text Mining - Scope and Applications Miss

More information

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Open Domain Information Extraction. Günter Neumann, DFKI, 2012 Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for

More information

8. Machine Learning Applied Artificial Intelligence

8. Machine Learning Applied Artificial Intelligence 8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name

More information

TDPA: Trend Detection and Predictive Analytics

TDPA: Trend Detection and Predictive Analytics TDPA: Trend Detection and Predictive Analytics M. Sakthi ganesh 1, CH.Pradeep Reddy 2, N.Manikandan 3, DR.P.Venkata krishna 4 1. Assistant Professor, School of Information Technology & Engineering (SITE),

More information

Identifying SPAM with Predictive Models

Identifying SPAM with Predictive Models Identifying SPAM with Predictive Models Dan Steinberg and Mikhaylo Golovnya Salford Systems 1 Introduction The ECML-PKDD 2006 Discovery Challenge posed a topical problem for predictive modelers: how to

More information

Identifying Focus, Techniques and Domain of Scientific Papers

Identifying Focus, Techniques and Domain of Scientific Papers Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of

More information

Data Science Center Eindhoven. Big Data: Challenges and Opportunities for Mathematicians. Alessandro Di Bucchianico

Data Science Center Eindhoven. Big Data: Challenges and Opportunities for Mathematicians. Alessandro Di Bucchianico Data Science Center Eindhoven Big Data: Challenges and Opportunities for Mathematicians Alessandro Di Bucchianico Dutch Mathematical Congress April 15, 2015 Contents 1. Big Data terminology 2. Various

More information

Managed N-gram Language Model Based on Hadoop Framework and a Hbase Tables

Managed N-gram Language Model Based on Hadoop Framework and a Hbase Tables Managed N-gram Language Model Based on Hadoop Framework and a Hbase Tables Tahani Mahmoud Allam Assistance lecture in Computer and Automatic Control Dept - Faculty of Engineering-Tanta University, Tanta,

More information

Text Analysis for Big Data. Magnus Sahlgren

Text Analysis for Big Data. Magnus Sahlgren Text Analysis for Big Data Magnus Sahlgren Data Size Style (editorial vs social) Language (there are other languages than English out there!) Data Size Style (editorial vs social) Language (there are

More information

Interpreting Market Responses to Economic Data

Interpreting Market Responses to Economic Data Interpreting Market Responses to Economic Data Patrick D Arcy and Emily Poole* This article discusses how bond, equity and foreign exchange markets have responded to the surprise component of Australian

More information

Crisis, Tragedy, and Recovery Network Digital Library (CTRnet) + Web Archiving in Qatar and VT

Crisis, Tragedy, and Recovery Network Digital Library (CTRnet) + Web Archiving in Qatar and VT Crisis, Tragedy, and Recovery Network Digital Library (CTRnet) + Web Archiving in Qatar and VT Edward A. Fox, Seungwon Yang, & CTRnet Team Department of Computer Science, Virginia Tech Workshop at WADL

More information

Solution to Individual homework 2 Revised: November 22, 2011

Solution to Individual homework 2 Revised: November 22, 2011 Macroeconomic Policy Fabrizio Perri November 24 at the start of class Solution to Individual homework 2 Revised: November 22, 2011 1. Fiscal Policy and Growth (50p) After reviewing the latest figures of

More information

Big Data Visualisations. Professor Ian Nabney i.t.nabney@aston.ac.uk NCRG

Big Data Visualisations. Professor Ian Nabney i.t.nabney@aston.ac.uk NCRG Big Data Visualisations Professor Ian Nabney i.t.nabney@aston.ac.uk NCRG Overview Why visualise data? How we can visualise data Big Data Institute What is Visualisation? Goal of visualisation is to present

More information

Question 2 Naïve Bayes (16 points)

Question 2 Naïve Bayes (16 points) Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the

More information

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Machine Learning and Data Mining. Fundamentals, robotics, recognition Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,

More information

INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER

INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. AGENDA Overview/Introduction to Data Mining

More information

Volatility Index (VIX) and S&P100 Volatility Index (VXO)

Volatility Index (VIX) and S&P100 Volatility Index (VXO) Volatility Index (VIX) and S&P100 Volatility Index (VXO) Michael McAleer School of Economics and Commerce University of Western Australia and Faculty of Economics Chiang Mai University Volatility Index

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification

More information

Text Analytics. A business guide

Text Analytics. A business guide Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application

More information

Text Analytics Industry Use Cases (& the Path Forward for Text Analytics) Aiaioo Labs - 2012. Bangalore, India. team@aiaioo.com

Text Analytics Industry Use Cases (& the Path Forward for Text Analytics) Aiaioo Labs - 2012. Bangalore, India. team@aiaioo.com Text Analytics Industry Use Cases (& the Path Forward for Text Analytics) Aiaioo Labs - 2012 Bangalore, India Title Cohan 10 years in industry Research interests: NLP and ML Sumukh 8 years in industry

More information

PREDICTING MARKET VOLATILITY FEDERAL RESERVE BOARD MEETING MINUTES FROM

PREDICTING MARKET VOLATILITY FEDERAL RESERVE BOARD MEETING MINUTES FROM PREDICTING MARKET VOLATILITY FROM FEDERAL RESERVE BOARD MEETING MINUTES Reza Bosagh Zadeh and Andreas Zollmann Lab Advisers: Noah Smith and Bryan Routledge GOALS Make Money! Not really. Find interesting

More information

Master s Program in Information Systems

Master s Program in Information Systems The University of Jordan King Abdullah II School for Information Technology Department of Information Systems Master s Program in Information Systems 2006/2007 Study Plan Master Degree in Information Systems

More information

An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System

An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System Asanee Kawtrakul ABSTRACT In information-age society, advanced retrieval technique and the automatic

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection

Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,

More information

Effective Self-Training for Parsing

Effective Self-Training for Parsing Effective Self-Training for Parsing David McClosky dmcc@cs.brown.edu Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson David McClosky - dmcc@cs.brown.edu

More information

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or

More information

Up/Down Analysis of Stock Index by Using Bayesian Network

Up/Down Analysis of Stock Index by Using Bayesian Network Engineering Management Research; Vol. 1, No. 2; 2012 ISSN 1927-7318 E-ISSN 1927-7326 Published by Canadian Center of Science and Education Up/Down Analysis of Stock Index by Using Bayesian Network Yi Zuo

More information

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.

Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5

More information

Economics 212 Principles of Macroeconomics Study Guide. David L. Kelly

Economics 212 Principles of Macroeconomics Study Guide. David L. Kelly Economics 212 Principles of Macroeconomics Study Guide David L. Kelly Department of Economics University of Miami Box 248126 Coral Gables, FL 33134 dkelly@miami.edu First Version: Spring, 2006 Current

More information

Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu

Machine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel

More information

Tagging with Hidden Markov Models

Tagging with Hidden Markov Models Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,

More information

Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams

Using Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams 2012 International Conference on Computer Technology and Science (ICCTS 2012) IPCSIT vol. XX (2012) (2012) IACSIT Press, Singapore Using Text and Data Mining Techniques to extract Stock Market Sentiment

More information

Examination II. Fixed income valuation and analysis. Economics

Examination II. Fixed income valuation and analysis. Economics Examination II Fixed income valuation and analysis Economics Questions Foundation examination March 2008 FIRST PART: Multiple Choice Questions (48 points) Hereafter you must answer all 12 multiple choice

More information

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde

Phase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction

More information

Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction

Sentiment Analysis of Movie Reviews and Twitter Statuses. Introduction Sentiment Analysis of Movie Reviews and Twitter Statuses Introduction Sentiment analysis is the task of identifying whether the opinion expressed in a text is positive or negative in general, or about

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Spend Enrichment: Making better decisions starts with accurate data

Spend Enrichment: Making better decisions starts with accurate data IBM Software Industry Solutions Industry/Product Identifier Spend Enrichment: Making better decisions starts with accurate data Spend Enrichment: Making better decisions starts with accurate data Contents

More information

Bayesian networks - Time-series models - Apache Spark & Scala

Bayesian networks - Time-series models - Apache Spark & Scala Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

The Basics of Graphical Models

The Basics of Graphical Models The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

More information

Evaluation of Machine Learning Techniques for Green Energy Prediction

Evaluation of Machine Learning Techniques for Green Energy Prediction arxiv:1406.3726v1 [cs.lg] 14 Jun 2014 Evaluation of Machine Learning Techniques for Green Energy Prediction 1 Objective Ankur Sahai University of Mainz, Germany We evaluate Machine Learning techniques

More information

Analytics on Big Data

Analytics on Big Data Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis

More information

Why big data? Lessons from a Decade+ Experiment in Big Data

Why big data? Lessons from a Decade+ Experiment in Big Data Why big data? Lessons from a Decade+ Experiment in Big Data David Belanger PhD Senior Research Fellow Stevens Institute of Technology dbelange@stevens.edu 1 What Does Big Look Like? 7 Image Source Page:

More information

Clustering Technique in Data Mining for Text Documents

Clustering Technique in Data Mining for Text Documents Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision

Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 vpekar@ufanet.ru Steffen STAAB Institute AIFB,

More information

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos) Machine Learning Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos) What Is Machine Learning? A computer program is said to learn from experience E with respect to some class of

More information

Exploring the use of Big Data techniques for simulating Algorithmic Trading Strategies

Exploring the use of Big Data techniques for simulating Algorithmic Trading Strategies Exploring the use of Big Data techniques for simulating Algorithmic Trading Strategies Nishith Tirpankar, Jiten Thakkar tirpankar.n@gmail.com, jitenmt@gmail.com December 20, 2015 Abstract In the world

More information

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader A Performance Evaluation of Open Source Graph Databases Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader Overview Motivation Options Evaluation Results Lessons Learned Moving Forward

More information

How To Create A Data Science System

How To Create A Data Science System Enhance Collaboration and Data Sharing for Faster Decisions and Improved Mission Outcome Richard Breakiron Senior Director, Cyber Solutions Rbreakiron@vion.com Office: 571-353-6127 / Cell: 803-443-8002

More information

Keep Decypha-ing! What s in it for You?

Keep Decypha-ing! What s in it for You? What s in it for You? Decypha is a comprehensive financial platform offering decision-enabling intelligence on the MENA region and even beyond. It has been designed using global best practices for investment

More information

How To Write A Summary Of A Review

How To Write A Summary Of A Review PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,

More information

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing CS 188: Artificial Intelligence Lecture 20: Dynamic Bayes Nets, Naïve Bayes Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. Part III: Machine Learning Up until now: how to reason in a model and

More information

AP Macroeconomics 2012 Scoring Guidelines

AP Macroeconomics 2012 Scoring Guidelines AP Macroeconomics 2012 Scoring Guidelines The College Board The College Board is a mission-driven not-for-profit organization that connects students to college success and opportunity. Founded in 1900,

More information

10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html 10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html

More information

Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy

Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy The Deep Web: Surfacing Hidden Value Michael K. Bergman Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy Presented by Mat Kelly CS895 Web-based Information Retrieval

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

More information

Web Content Mining and NLP. Bing Liu Department of Computer Science University of Illinois at Chicago liub@cs.uic.edu http://www.cs.uic.

Web Content Mining and NLP. Bing Liu Department of Computer Science University of Illinois at Chicago liub@cs.uic.edu http://www.cs.uic. Web Content Mining and NLP Bing Liu Department of Computer Science University of Illinois at Chicago liub@cs.uic.edu http://www.cs.uic.edu/~liub Introduction The Web is perhaps the single largest and distributed

More information

Concept Term Expansion Approach for Monitoring Reputation of Companies on Twitter

Concept Term Expansion Approach for Monitoring Reputation of Companies on Twitter Concept Term Expansion Approach for Monitoring Reputation of Companies on Twitter M. Atif Qureshi 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group, National University

More information

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Understanding Web personalization with Web Usage Mining and its Application: Recommender System Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,

More information

Learning from Data: Naive Bayes

Learning from Data: Naive Bayes Semester 1 http://www.anc.ed.ac.uk/ amos/lfd/ Naive Bayes Typical example: Bayesian Spam Filter. Naive means naive. Bayesian methods can be much more sophisticated. Basic assumption: conditional independence.

More information

Software Architecture Document

Software Architecture Document Software Architecture Document Natural Language Processing Cell Version 1.0 Natural Language Processing Cell Software Architecture Document Version 1.0 1 1. Table of Contents 1. Table of Contents... 2

More information

Sentiment analysis on tweets in a financial domain

Sentiment analysis on tweets in a financial domain Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International

More information

Data Mining System, Functionalities and Applications: A Radical Review

Data Mining System, Functionalities and Applications: A Radical Review Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially

More information

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition

More information

Word Completion and Prediction in Hebrew

Word Completion and Prediction in Hebrew Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology

More information

Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University

Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University Collective Behavior Prediction in Social Media Lei Tang Data Mining & Machine Learning Group Arizona State University Social Media Landscape Social Network Content Sharing Social Media Blogs Wiki Forum

More information

DATA PREPARATION FOR DATA MINING

DATA PREPARATION FOR DATA MINING Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI

More information

Tech Presentation 2016

Tech Presentation 2016 Tech Presentation 2016 Our Management Team Marvin Igelman CEO Alex Zivkovic CTO David Berman CFO Matt Burns PM and Growth BreakingSports is the world s first fully automated real-time alerts platform for

More information

ISSUES IN RULE BASED KNOWLEDGE DISCOVERING PROCESS

ISSUES IN RULE BASED KNOWLEDGE DISCOVERING PROCESS Advances and Applications in Statistical Sciences Proceedings of The IV Meeting on Dynamics of Social and Economic Systems Volume 2, Issue 2, 2010, Pages 303-314 2010 Mili Publications ISSUES IN RULE BASED

More information

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10

More information

Information Visualization WS 2013/14 11 Visual Analytics

Information Visualization WS 2013/14 11 Visual Analytics 1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,

More information

Sentiment analysis: towards a tool for analysing real-time students feedback

Sentiment analysis: towards a tool for analysing real-time students feedback Sentiment analysis: towards a tool for analysing real-time students feedback Nabeela Altrabsheh Email: nabeela.altrabsheh@port.ac.uk Mihaela Cocea Email: mihaela.cocea@port.ac.uk Sanaz Fallahkhair Email:

More information

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

More information

Unsupervised Data Mining (Clustering)

Unsupervised Data Mining (Clustering) Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in

More information

Applying Data Analysis to Big Data Benchmarks. Jazmine Olinger

Applying Data Analysis to Big Data Benchmarks. Jazmine Olinger Applying Data Analysis to Big Data Benchmarks Jazmine Olinger Abstract This paper describes finding accurate and fast ways to simulate Big Data benchmarks. Specifically, using the currently existing simulation

More information

Exploration and Visualization of Post-Market Data

Exploration and Visualization of Post-Market Data Exploration and Visualization of Post-Market Data Jianying Hu, PhD Joint work with David Gotz, Shahram Ebadollahi, Jimeng Sun, Fei Wang, Marianthi Markatou Healthcare Analytics Research IBM T.J. Watson

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

Turker-Assisted Paraphrasing for English-Arabic Machine Translation

Turker-Assisted Paraphrasing for English-Arabic Machine Translation Turker-Assisted Paraphrasing for English-Arabic Machine Translation Michael Denkowski and Hassan Al-Haj and Alon Lavie Language Technologies Institute School of Computer Science Carnegie Mellon University

More information

Data Mining Techniques

Data Mining Techniques 15.564 Information Technology I Business Intelligence Outline Operational vs. Decision Support Systems What is Data Mining? Overview of Data Mining Techniques Overview of Data Mining Process Data Warehouses

More information

SEMANTICS ENABLED PROACTIVE AND TARGETED DISSEMINATION OF NEW MEDICAL KNOWLEDGE

SEMANTICS ENABLED PROACTIVE AND TARGETED DISSEMINATION OF NEW MEDICAL KNOWLEDGE SEMANTICS ENABLED PROACTIVE AND TARGETED DISSEMINATION OF NEW MEDICAL KNOWLEDGE Lakshmish Ramaswamy & I. Budak Arpinar Dept. of Computer Science, University of Georgia laks@cs.uga.edu, budak@cs.uga.edu

More information

Florida International University - University of Miami TRECVID 2014

Florida International University - University of Miami TRECVID 2014 Florida International University - University of Miami TRECVID 2014 Miguel Gavidia 3, Tarek Sayed 1, Yilin Yan 1, Quisha Zhu 1, Mei-Ling Shyu 1, Shu-Ching Chen 2, Hsin-Yu Ha 2, Ming Ma 1, Winnie Chen 4,

More information

Big Data from a Database Theory Perspective

Big Data from a Database Theory Perspective Big Data from a Database Theory Perspective Martin Grohe Lehrstuhl Informatik 7 - Logic and the Theory of Discrete Systems A CS View on Data Science Applications Data System Users 2 Us Data HUGE heterogeneous

More information