Automatically Tracking Events in the News. Kathleen McKeown Department of Computer Science Columbia University
|
|
- Jasper Brown
- 7 years ago
- Views:
Transcription
1 1 Automatically Tracking Events in the News Kathleen McKeown Department of Computer Science Columbia University
2 2 Vision Generating presentations that connect Events Opinions Personal accounts Their impact on the world
3 3 Two Tasks Monitoring events over time Predicting their impact on financial markets Joint work with Tony Jebara and David Yao
4 4 Machine learning framework Data (often labeled) Extraction of features from text data Prediction of output
5 5 Machine learning framework Data (often labeled) Extraction of features from text data Prediction of output What data is available for learning?
6 6 Machine learning framework Data (often labeled) Extraction of features from text data Prediction of output What features yield good predictions?
7 7 Two Tasks Monitoring events over time Predicting their impact on financial markets Joint work with Tony Jebara and David Yao
8 8 Monitor events over time Input: streaming data News, social media, web pages At every hour, what s new
9 9
10 10
11 Text Compression 11
12 12 Data NIST evaluation on temporal summarization hourly web crawl October February TB Different categories of disaster Climate, man-made, social unrest
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 Temporal Summarization Approach At time t: 1. Predict salience for input sentences Disaster-specific features for predicting salience 2. Remove redundant sentences 3. Cluster and select exemplar sentences for t Incorporate salience prediction as a prior Kedzie & al, Bloomberg Social Good Workshop, KDD 2014 Kedzie & al, ACL
27 27
28 28 Predicting Salience: Model Features Basic sentence level features sentence length punctuation count number of capitalized words number of event type synonyms, hypernyms, and hyponyms
29 29 Predicting Salience: Model Features Basic sentence level features sentence length punctuation count number of capitalized words number of event type synonyms, hypernyms, and hyponyms Why is the number of capitalized words important?
30 Predicting Salience: Model Features Basic sentence level features sentence length punctuation count number of capitalized words number of event type synonyms, hypernyms, and hyponyms High Salience Nicaragua's disaster management said it had issued a local tsunami alert. Medium Salience People streamed out of homes, schools and oce buildings as far north as Mexico City. Low Salience Add to Digg Add to del.icio.us Add to Facebook Add to 30
31 Predicting Salience: Model Features Basic sentence level features sentence length punctuation count number of capitalized words number of event type synonyms, hypernyms, and hyponyms High Salience Nicaragua's disaster management said it had issued a local tsunami alert. Medium Salience People streamed out of homes, schools and oce buildings as far north as Mexico City. Low Salience Add to Digg Add to del.icio.us Add to Facebook Add to 31
32 32 Predicting Salience: Model Features Basic sentence level features sentence length punctuation count number of capitalized words number of event type synonyms, hypernyms, and hyponyms Why are synonyms, hypernyms and hyponyms important?
33 Predicting Salience: Model Features Basic sentence level features sentence length punctuation count number of capitalized words number of event type synonyms, hypernyms, and hyponyms High Salience Nicaragua's disaster management said it had issued a local tsunami alert. Medium Salience People streamed out of homes, schools and oce buildings as far north as Mexico City. Low Salience Add to Digg Add to del.icio.us Add to Facebook Add to 33
34 34 Predicting Salience: Model Features Basic sentence level features Language Models (5-gram Kneser-Ney model) generic news corpus (10 years AP and NY Times articles) domain specific corpus (disaster related Wikipedia articles) What does a generic language model capture? What does a domain specific language model capture?
35 Predicting Salience: Model Features Basic sentence level features Language Models (5-gram Kneser-Ney model) generic news corpus (10 years AP and NY Times articles) domain specific corpus (disaster related Wikipedia articles) High Salience Nicaragua's disaster management said it had issued a local tsunami alert. Medium Salience People streamed out of homes, schools and oce buildings as far north as Mexico City. Low Salience Add to Digg Add to del.icio.us Add to Facebook Add to 35
36 Predicting Salience: Model Features Basic sentence level features Language Models (5-gram Kneser-Ney model) generic news corpus (10 years AP and NY Times articles) domain specific corpus (disaster related Wikipedia articles) High Salience Nicaragua's disaster management said it had issued a local tsunami alert. Medium Salience People streamed out of homes, schools and oce buildings as far north as Mexico City. Low Salience Add to Digg Add to del.icio.us Add to Facebook Add to 36
37 Predicting Salience: Model Features Basic sentence level features Language Models (5-gram Kneser-Ney model) generic news corpus (10 years AP and NY Times articles) domain specific corpus (disaster related Wikipedia articles) High Salience Nicaragua's disaster management said it had issued a local tsunami alert. Medium Salience People streamed out of homes, schools and oce buildings as far north as Mexico City. Low Salience Add to Digg Add to del.icio.us Add to Facebook Add to 37
38 Predicting Salience: Model Features Basic sentence level features Language Models (5-gram Kneser-Ney model) Geographic Features tag input with Named-Entity tagger get coordinates for locations and mean distance to event High Salience Nicaragua's disaster management said it had issued a local tsunami alert. Medium Salience People streamed out of homes, schools and oce buildings as far north as Mexico City. Low Salience Add to Digg Add to del.icio.us Add to Facebook Add to 38
39 Predicting Salience: Model Features Basic sentence level features Language Models (5-gram Kneser-Ney model) Geographic Features tag input with Named-Entity tagger get coordinates for locations and mean distance to event High Salience Nicaragua's disaster management said it had issued a local tsunami alert. Medium Salience People streamed out of homes, schools and oce buildings as far north as Mexico City. Low Salience Add to Digg Add to del.icio.us Add to Facebook Add to 39
40 40 Predicting Salience: Model Features Basic sentence level features Language Models (5-gram Kneser-Ney model) Geographic Features Temporal Features measuring burstiness of words How might we measure the burstiness of words?
41 41 Determining Redundancy Use a semantic similarity metric Discard sentences with similarity to previous sentences
42 42
43 43
44 44
45 45
46 46
47 SubEvent Identification Decompose articles on a main event into related sub-events: Hurricane Sandy Manhattan Blackout Breezy Point fire Public Transit 47
48 48
49 49 Two Tasks Monitoring events over time Predicting their impact on financial markets Joint work with Tony Jebara and David Yao
50 50 COLUMBIA DATA SCIENCE INSTITUTE Can we predict the effect that a particular event such as extreme weather or political activity -- would have on financial markets? Take market financial data and traditional news feeds. Machine Learning Data Science Natural Language Processing Use Natural Language Processing to transform raw data into structured event streams. Financial and Economic Indices Apply Machine Learning tools to uncover previously hidden relationships between events and market behavior.
51 Financial Data Eventt Streams Bayesian Network Structure Discovery News Feeds Inference 51 Predicting Market Impact
52 52 NLP Event Features Binary features calculated from news Event type: 11 possible categories Event location: US or World Sampled daily and hourly
53 53 Financials, Macroeconomics Index/indicators selection Financial market indices (65) Stock market (15 broad + 27 sector) Bond market (8) Volatility index (6) Commodities (9) Macroeconomic indicators (8) real GDP (2) CPI (1) PPI (1) Income (1) Consumption (1) Employment/Unemployment situation (2)
54 54 Transform and Binarize Compute relative change in indices
55 Equal frequency discretization (S&P 500) 55
56 ML Structure Learning To learn structure we assume samples are drawn iid from unknown pairwise binary graphical model (no hidden variables) Training data: Newsblaster data and financial indicators Use method of Ravikumar, Wainwright, Lafferty [2010] Asymptotically optimal given mild assumptions & regularizer l 56
57 RVX Russell 2000 Volatility VXD Dow Volatility VXO S&P 100 Volatility 57 SCI TECH ECONOMIC DISASTER RVX VXD VXO
58 58 Results We can predict relation between event and market impact with significantly higher average test likelihood for held-out test days Naïve Tree D-Tree Ours Daily observations are orders of magnitude more likely under our model
59 59 Impact Learn from past events to predict the impact of new events on financial markets Identify new events as they occur in news feeds Graph modeling enables identification of structural relationships between detected events and financial events
60 60 Thank You! The research presented here has been supported in part by DARPA GRAPH, and NSF.
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationnot possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
More informationMLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS 2014. November 7, 2014. Machine Learning Group
Big Data and Its Implication to Research Methodologies and Funding Cornelia Caragea TARDIS 2014 November 7, 2014 UNT Computer Science and Engineering Data Everywhere Lots of data is being collected and
More information3 Paraphrase Acquisition. 3.1 Overview. 2 Prior Work
Unsupervised Paraphrase Acquisition via Relation Discovery Takaaki Hasegawa Cyberspace Laboratories Nippon Telegraph and Telephone Corporation 1-1 Hikarinooka, Yokosuka, Kanagawa 239-0847, Japan hasegawa.takaaki@lab.ntt.co.jp
More informationData Mining on Social Networks. Dionysios Sotiropoulos Ph.D.
Data Mining on Social Networks Dionysios Sotiropoulos Ph.D. 1 Contents What are Social Media? Mathematical Representation of Social Networks Fundamental Data Mining Concepts Data Mining Tasks on Digital
More informationAttribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)
Machine Learning 1 Attribution Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley) 2 Outline Inductive learning Decision
More informationText Analytics for Competitive Analysis and Market Intelligence Aiaioo Labs - 2011
Text Analytics for Competitive Analysis and Market Intelligence Aiaioo Labs - 2011 Bangalore, India Title Text Analytics Introduction Entity Person Comparative Analysis Entity or Event Text Analytics Text
More informationWeb Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
More informationExample application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health
Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining
More informationText Mining - Scope and Applications
Journal of Computer Science and Applications. ISSN 2231-1270 Volume 5, Number 2 (2013), pp. 51-55 International Research Publication House http://www.irphouse.com Text Mining - Scope and Applications Miss
More informationOpen Domain Information Extraction. Günter Neumann, DFKI, 2012
Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for
More information8. Machine Learning Applied Artificial Intelligence
8. Machine Learning Applied Artificial Intelligence Prof. Dr. Bernhard Humm Faculty of Computer Science Hochschule Darmstadt University of Applied Sciences 1 Retrospective Natural Language Processing Name
More informationTDPA: Trend Detection and Predictive Analytics
TDPA: Trend Detection and Predictive Analytics M. Sakthi ganesh 1, CH.Pradeep Reddy 2, N.Manikandan 3, DR.P.Venkata krishna 4 1. Assistant Professor, School of Information Technology & Engineering (SITE),
More informationIdentifying SPAM with Predictive Models
Identifying SPAM with Predictive Models Dan Steinberg and Mikhaylo Golovnya Salford Systems 1 Introduction The ECML-PKDD 2006 Discovery Challenge posed a topical problem for predictive modelers: how to
More informationIdentifying Focus, Techniques and Domain of Scientific Papers
Identifying Focus, Techniques and Domain of Scientific Papers Sonal Gupta Department of Computer Science Stanford University Stanford, CA 94305 sonal@cs.stanford.edu Christopher D. Manning Department of
More informationData Science Center Eindhoven. Big Data: Challenges and Opportunities for Mathematicians. Alessandro Di Bucchianico
Data Science Center Eindhoven Big Data: Challenges and Opportunities for Mathematicians Alessandro Di Bucchianico Dutch Mathematical Congress April 15, 2015 Contents 1. Big Data terminology 2. Various
More informationManaged N-gram Language Model Based on Hadoop Framework and a Hbase Tables
Managed N-gram Language Model Based on Hadoop Framework and a Hbase Tables Tahani Mahmoud Allam Assistance lecture in Computer and Automatic Control Dept - Faculty of Engineering-Tanta University, Tanta,
More informationText Analysis for Big Data. Magnus Sahlgren
Text Analysis for Big Data Magnus Sahlgren Data Size Style (editorial vs social) Language (there are other languages than English out there!) Data Size Style (editorial vs social) Language (there are
More informationInterpreting Market Responses to Economic Data
Interpreting Market Responses to Economic Data Patrick D Arcy and Emily Poole* This article discusses how bond, equity and foreign exchange markets have responded to the surprise component of Australian
More informationCrisis, Tragedy, and Recovery Network Digital Library (CTRnet) + Web Archiving in Qatar and VT
Crisis, Tragedy, and Recovery Network Digital Library (CTRnet) + Web Archiving in Qatar and VT Edward A. Fox, Seungwon Yang, & CTRnet Team Department of Computer Science, Virginia Tech Workshop at WADL
More informationSolution to Individual homework 2 Revised: November 22, 2011
Macroeconomic Policy Fabrizio Perri November 24 at the start of class Solution to Individual homework 2 Revised: November 22, 2011 1. Fiscal Policy and Growth (50p) After reviewing the latest figures of
More informationBig Data Visualisations. Professor Ian Nabney i.t.nabney@aston.ac.uk NCRG
Big Data Visualisations Professor Ian Nabney i.t.nabney@aston.ac.uk NCRG Overview Why visualise data? How we can visualise data Big Data Institute What is Visualisation? Goal of visualisation is to present
More informationQuestion 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
More informationMachine Learning and Data Mining. Fundamentals, robotics, recognition
Machine Learning and Data Mining Fundamentals, robotics, recognition Machine Learning, Data Mining, Knowledge Discovery in Data Bases Their mutual relations Data Mining, Knowledge Discovery in Databases,
More informationINTRODUCTION TO DATA MINING SAS ENTERPRISE MINER
INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. AGENDA Overview/Introduction to Data Mining
More informationVolatility Index (VIX) and S&P100 Volatility Index (VXO)
Volatility Index (VIX) and S&P100 Volatility Index (VXO) Michael McAleer School of Economics and Commerce University of Western Australia and Faculty of Economics Chiang Mai University Volatility Index
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.1 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Classification vs. Numeric Prediction Prediction Process Data Preparation Comparing Prediction Methods References Classification
More informationText Analytics. A business guide
Text Analytics A business guide February 2014 Contents 3 The Business Value of Text Analytics 4 What is Text Analytics? 6 Text Analytics Methods 8 Unstructured Meets Structured Data 9 Business Application
More informationText Analytics Industry Use Cases (& the Path Forward for Text Analytics) Aiaioo Labs - 2012. Bangalore, India. team@aiaioo.com
Text Analytics Industry Use Cases (& the Path Forward for Text Analytics) Aiaioo Labs - 2012 Bangalore, India Title Cohan 10 years in industry Research interests: NLP and ML Sumukh 8 years in industry
More informationPREDICTING MARKET VOLATILITY FEDERAL RESERVE BOARD MEETING MINUTES FROM
PREDICTING MARKET VOLATILITY FROM FEDERAL RESERVE BOARD MEETING MINUTES Reza Bosagh Zadeh and Andreas Zollmann Lab Advisers: Noah Smith and Bryan Routledge GOALS Make Money! Not really. Find interesting
More informationMaster s Program in Information Systems
The University of Jordan King Abdullah II School for Information Technology Department of Information Systems Master s Program in Information Systems 2006/2007 Study Plan Master Degree in Information Systems
More informationAn Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System
An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System Asanee Kawtrakul ABSTRACT In information-age society, advanced retrieval technique and the automatic
More informationUsing Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
More informationDublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,
More informationEffective Self-Training for Parsing
Effective Self-Training for Parsing David McClosky dmcc@cs.brown.edu Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson David McClosky - dmcc@cs.brown.edu
More informationSearch and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
More informationUp/Down Analysis of Stock Index by Using Bayesian Network
Engineering Management Research; Vol. 1, No. 2; 2012 ISSN 1927-7318 E-ISSN 1927-7326 Published by Canadian Center of Science and Education Up/Down Analysis of Stock Index by Using Bayesian Network Yi Zuo
More informationSentiment analysis on news articles using Natural Language Processing and Machine Learning Approach.
Sentiment analysis on news articles using Natural Language Processing and Machine Learning Approach. Pranali Chilekar 1, Swati Ubale 2, Pragati Sonkambale 3, Reema Panarkar 4, Gopal Upadhye 5 1 2 3 4 5
More informationEconomics 212 Principles of Macroeconomics Study Guide. David L. Kelly
Economics 212 Principles of Macroeconomics Study Guide David L. Kelly Department of Economics University of Miami Box 248126 Coral Gables, FL 33134 dkelly@miami.edu First Version: Spring, 2006 Current
More informationMachine Learning. CUNY Graduate Center, Spring 2013. Professor Liang Huang. huang@cs.qc.cuny.edu
Machine Learning CUNY Graduate Center, Spring 2013 Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning Logistics Lectures M 9:30-11:30 am Room 4419 Personnel
More informationTagging with Hidden Markov Models
Tagging with Hidden Markov Models Michael Collins 1 Tagging Problems In many NLP problems, we would like to model pairs of sequences. Part-of-speech (POS) tagging is perhaps the earliest, and most famous,
More informationUsing Text and Data Mining Techniques to extract Stock Market Sentiment from Live News Streams
2012 International Conference on Computer Technology and Science (ICCTS 2012) IPCSIT vol. XX (2012) (2012) IACSIT Press, Singapore Using Text and Data Mining Techniques to extract Stock Market Sentiment
More informationExamination II. Fixed income valuation and analysis. Economics
Examination II Fixed income valuation and analysis Economics Questions Foundation examination March 2008 FIRST PART: Multiple Choice Questions (48 points) Hereafter you must answer all 12 multiple choice
More informationPhase 2 of the D4 Project. Helmut Schmid and Sabine Schulte im Walde
Statistical Verb-Clustering Model soft clustering: Verbs may belong to several clusters trained on verb-argument tuples clusters together verbs with similar subcategorization and selectional restriction
More informationSentiment Analysis of Movie Reviews and Twitter Statuses. Introduction
Sentiment Analysis of Movie Reviews and Twitter Statuses Introduction Sentiment analysis is the task of identifying whether the opinion expressed in a text is positive or negative in general, or about
More informationIntroduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
More informationSpend Enrichment: Making better decisions starts with accurate data
IBM Software Industry Solutions Industry/Product Identifier Spend Enrichment: Making better decisions starts with accurate data Spend Enrichment: Making better decisions starts with accurate data Contents
More informationBayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationHow to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
More informationThe Basics of Graphical Models
The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures
More informationEvaluation of Machine Learning Techniques for Green Energy Prediction
arxiv:1406.3726v1 [cs.lg] 14 Jun 2014 Evaluation of Machine Learning Techniques for Green Energy Prediction 1 Objective Ankur Sahai University of Mainz, Germany We evaluate Machine Learning techniques
More informationAnalytics on Big Data
Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis
More informationWhy big data? Lessons from a Decade+ Experiment in Big Data
Why big data? Lessons from a Decade+ Experiment in Big Data David Belanger PhD Senior Research Fellow Stevens Institute of Technology dbelange@stevens.edu 1 What Does Big Look Like? 7 Image Source Page:
More informationClustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)
More informationTaxonomy learning factoring the structure of a taxonomy into a semantic classification decision
Taxonomy learning factoring the structure of a taxonomy into a semantic classification decision Viktor PEKAR Bashkir State University Ufa, Russia, 450000 vpekar@ufanet.ru Steffen STAAB Institute AIFB,
More informationMachine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)
Machine Learning Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos) What Is Machine Learning? A computer program is said to learn from experience E with respect to some class of
More informationExploring the use of Big Data techniques for simulating Algorithmic Trading Strategies
Exploring the use of Big Data techniques for simulating Algorithmic Trading Strategies Nishith Tirpankar, Jiten Thakkar tirpankar.n@gmail.com, jitenmt@gmail.com December 20, 2015 Abstract In the world
More informationA Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader
A Performance Evaluation of Open Source Graph Databases Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader Overview Motivation Options Evaluation Results Lessons Learned Moving Forward
More informationHow To Create A Data Science System
Enhance Collaboration and Data Sharing for Faster Decisions and Improved Mission Outcome Richard Breakiron Senior Director, Cyber Solutions Rbreakiron@vion.com Office: 571-353-6127 / Cell: 803-443-8002
More informationKeep Decypha-ing! What s in it for You?
What s in it for You? Decypha is a comprehensive financial platform offering decision-enabling intelligence on the MENA region and even beyond. It has been designed using global best practices for investment
More informationHow To Write A Summary Of A Review
PRODUCT REVIEW RANKING SUMMARIZATION N.P.Vadivukkarasi, Research Scholar, Department of Computer Science, Kongu Arts and Science College, Erode. Dr. B. Jayanthi M.C.A., M.Phil., Ph.D., Associate Professor,
More informationPart III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing
CS 188: Artificial Intelligence Lecture 20: Dynamic Bayes Nets, Naïve Bayes Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. Part III: Machine Learning Up until now: how to reason in a model and
More informationAP Macroeconomics 2012 Scoring Guidelines
AP Macroeconomics 2012 Scoring Guidelines The College Board The College Board is a mission-driven not-for-profit organization that connects students to college success and opportunity. Founded in 1900,
More information10-601. Machine Learning. http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
10-601 Machine Learning http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html Course data All up-to-date info is on the course web page: http://www.cs.cmu.edu/afs/cs/academic/class/10601-f10/index.html
More informationWeb-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy
The Deep Web: Surfacing Hidden Value Michael K. Bergman Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy Presented by Mat Kelly CS895 Web-based Information Retrieval
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationNew Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
More informationWeb Content Mining and NLP. Bing Liu Department of Computer Science University of Illinois at Chicago liub@cs.uic.edu http://www.cs.uic.
Web Content Mining and NLP Bing Liu Department of Computer Science University of Illinois at Chicago liub@cs.uic.edu http://www.cs.uic.edu/~liub Introduction The Web is perhaps the single largest and distributed
More informationConcept Term Expansion Approach for Monitoring Reputation of Companies on Twitter
Concept Term Expansion Approach for Monitoring Reputation of Companies on Twitter M. Atif Qureshi 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group, National University
More informationUnderstanding Web personalization with Web Usage Mining and its Application: Recommender System
Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,
More informationLearning from Data: Naive Bayes
Semester 1 http://www.anc.ed.ac.uk/ amos/lfd/ Naive Bayes Typical example: Bayesian Spam Filter. Naive means naive. Bayesian methods can be much more sophisticated. Basic assumption: conditional independence.
More informationSoftware Architecture Document
Software Architecture Document Natural Language Processing Cell Version 1.0 Natural Language Processing Cell Software Architecture Document Version 1.0 1 1. Table of Contents 1. Table of Contents... 2
More informationSentiment analysis on tweets in a financial domain
Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International
More informationData Mining System, Functionalities and Applications: A Radical Review
Data Mining System, Functionalities and Applications: A Radical Review Dr. Poonam Chaudhary System Programmer, Kurukshetra University, Kurukshetra Abstract: Data Mining is the process of locating potentially
More informationIntroduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing
Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition
More informationWord Completion and Prediction in Hebrew
Experiments with Language Models for בס"ד Word Completion and Prediction in Hebrew 1 Yaakov HaCohen-Kerner, Asaf Applebaum, Jacob Bitterman Department of Computer Science Jerusalem College of Technology
More informationCollective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University
Collective Behavior Prediction in Social Media Lei Tang Data Mining & Machine Learning Group Arizona State University Social Media Landscape Social Network Content Sharing Social Media Blogs Wiki Forum
More informationDATA PREPARATION FOR DATA MINING
Applied Artificial Intelligence, 17:375 381, 2003 Copyright # 2003 Taylor & Francis 0883-9514/03 $12.00 +.00 DOI: 10.1080/08839510390219264 u DATA PREPARATION FOR DATA MINING SHICHAO ZHANG and CHENGQI
More informationTech Presentation 2016
Tech Presentation 2016 Our Management Team Marvin Igelman CEO Alex Zivkovic CTO David Berman CFO Matt Burns PM and Growth BreakingSports is the world s first fully automated real-time alerts platform for
More informationISSUES IN RULE BASED KNOWLEDGE DISCOVERING PROCESS
Advances and Applications in Statistical Sciences Proceedings of The IV Meeting on Dynamics of Social and Economic Systems Volume 2, Issue 2, 2010, Pages 303-314 2010 Mili Publications ISSUES IN RULE BASED
More informationNetwork Big Data: Facing and Tackling the Complexities Xiaolong Jin
Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10
More informationInformation Visualization WS 2013/14 11 Visual Analytics
1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and
More informationSPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
More informationVCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
More informationSentiment analysis: towards a tool for analysing real-time students feedback
Sentiment analysis: towards a tool for analysing real-time students feedback Nabeela Altrabsheh Email: nabeela.altrabsheh@port.ac.uk Mihaela Cocea Email: mihaela.cocea@port.ac.uk Sanaz Fallahkhair Email:
More informationINDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its
More informationUnsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
More informationApplying Data Analysis to Big Data Benchmarks. Jazmine Olinger
Applying Data Analysis to Big Data Benchmarks Jazmine Olinger Abstract This paper describes finding accurate and fast ways to simulate Big Data benchmarks. Specifically, using the currently existing simulation
More informationExploration and Visualization of Post-Market Data
Exploration and Visualization of Post-Market Data Jianying Hu, PhD Joint work with David Gotz, Shahram Ebadollahi, Jimeng Sun, Fei Wang, Marianthi Markatou Healthcare Analytics Research IBM T.J. Watson
More informationPredicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
More informationTurker-Assisted Paraphrasing for English-Arabic Machine Translation
Turker-Assisted Paraphrasing for English-Arabic Machine Translation Michael Denkowski and Hassan Al-Haj and Alon Lavie Language Technologies Institute School of Computer Science Carnegie Mellon University
More informationData Mining Techniques
15.564 Information Technology I Business Intelligence Outline Operational vs. Decision Support Systems What is Data Mining? Overview of Data Mining Techniques Overview of Data Mining Process Data Warehouses
More informationSEMANTICS ENABLED PROACTIVE AND TARGETED DISSEMINATION OF NEW MEDICAL KNOWLEDGE
SEMANTICS ENABLED PROACTIVE AND TARGETED DISSEMINATION OF NEW MEDICAL KNOWLEDGE Lakshmish Ramaswamy & I. Budak Arpinar Dept. of Computer Science, University of Georgia laks@cs.uga.edu, budak@cs.uga.edu
More informationFlorida International University - University of Miami TRECVID 2014
Florida International University - University of Miami TRECVID 2014 Miguel Gavidia 3, Tarek Sayed 1, Yilin Yan 1, Quisha Zhu 1, Mei-Ling Shyu 1, Shu-Ching Chen 2, Hsin-Yu Ha 2, Ming Ma 1, Winnie Chen 4,
More informationBig Data from a Database Theory Perspective
Big Data from a Database Theory Perspective Martin Grohe Lehrstuhl Informatik 7 - Logic and the Theory of Discrete Systems A CS View on Data Science Applications Data System Users 2 Us Data HUGE heterogeneous
More information