An Introduction to Machine Learning and Natural Language Processing Tools

Similar documents
Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

Machine Learning Final Project Spam Filtering

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Challenges of Cloud Scale Natural Language Processing

Search and Information Retrieval

Ming-Wei Chang. Machine learning and its applications to natural language processing, information retrieval and data mining.

Removing Web Spam Links from Search Engine Results

Sentiment Analysis on Big Data

Introduction. A. Bellaachia Page: 1

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

1 What is Machine Learning?

Technical challenges in web advertising

Big Data Text Mining and Visualization. Anton Heijs

OPINION MINING IN PRODUCT REVIEW SYSTEM USING BIG DATA TECHNOLOGY HADOOP

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Classification using Logistic Regression

Local SEO White Paper. The What, Why and How of Putting Your Business on the Local Map

Finding Advertising Keywords on Web Pages. Contextual Ads 101

How To Perform Predictive Analysis On Your Web Analytics Data In R 2.5

Computational Advertising Andrei Broder Yahoo! Research. SCECR, May 30, 2009

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju

SEO. People turn to search engines for questions. When they arrive at your website they are expecting immediate answers

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Bayes and Naïve Bayes. cs534-machine Learning

Folksonomies versus Automatic Keyword Extraction: An Empirical Study

Machine Learning using MapReduce

Knowledge Discovery from patents using KMX Text Analytics

Limitations of Human Vision. What is computer vision? What is computer vision (cont d)?

Supervised Learning (Big Data Analytics)

CSE 473: Artificial Intelligence Autumn 2010

70 % MKT 13 % MKT % MKT 16 GUIDE TO SEO

Machine Learning with MATLAB David Willingham Application Engineer


Learning is a very general term denoting the way in which agents:

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

Projektgruppe. Categorization of text documents via classification

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Azure Machine Learning, SQL Data Mining and R

HELP DESK SYSTEMS. Using CaseBased Reasoning

How To Create A Text Classification System For Spam Filtering

Sentiment analysis on tweets in a financial domain

Bayesian Spam Filtering

T Non-discriminatory Machine Learning

CSCI6900 Assignment 2: Naïve Bayes on Hadoop

Big Data: Study in Structured and Unstructured Data

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek

A.I. in health informatics lecture 1 introduction & stuff kevin small & byron wallace

A Symptom Extraction and Classification Method for Self-Management

The Big Data Paradigm Shift. Insight Through Automation

Simple and efficient online algorithms for real world applications

CS 40 Computing for the Web

Semi-Supervised Learning for Blog Classification

II. RELATED WORK. Sentiment Mining

Social Media Mining. Data Mining Essentials

Text Mining - Scope and Applications

Journal of Global Research in Computer Science RESEARCH SUPPORT SYSTEMS AS AN EFFECTIVE WEB BASED INFORMATION SYSTEM

Machine Learning. CS 188: Artificial Intelligence Naïve Bayes. Example: Digit Recognition. Other Classification Tasks

ISSN: CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Predicting the NFL Using Twitter. Shiladitya Sinha, Chris Dyer, Kevin Gimpel, Noah Smith

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Part 1: Link Analysis & Page Rank

How To Make Sense Of Data With Altilia

Kofax Transformation Modules Generic Versus Specific Online Learning

Sending broadcast s

PineApp Daily Traffic Report

Analytics on Big Data

Search Engine Optimisation workbook

: Introduction to Machine Learning Dr. Rita Osadchy

Scalable Machine Learning - or what to do with all that Big Data infrastructure

SEO for Profit. A Wordtracker Masterclass in search engine optimization. Mark Nunney

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r

Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy

Big Data Analytics CSCI 4030

Interactive Machine Learning. Maria-Florina Balcan

Domain Adaptive Relation Extraction for Big Text Data Analytics. Feiyu Xu

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

Statistics for BIG data

Content-Based Recommendation

Taxonomy Enterprise System Search Makes Finding Files Easy

Inner Classification of Clusters for Online News

1 Maximum likelihood estimation

Context Aware Predictive Analytics: Motivation, Potential, Challenges

Blog Post Extraction Using Title Finding

Machine Learning in Spam Filtering

Transcription:

An Introduction to Machine Learning and Natural Language Processing Tools Presented by: Mark Sammons, Vivek Srikumar (Many slides courtesy of Nick Rizzolo) 8/24/2010-8/26/2010

Some reasonably reliable facts * 1.7 ZB (= 10^21 bytes) of new digital information added worldwide in 2010 95% of this is unstructured (e.g. not database entries) 25% is images 6EB is email (1,000 EB = 1 ZB ) How to manage/access the tiny relevant fraction of this data? *Source: Dr. Joseph Kielman, DHS: projected figures taken from presentation in Summer 2010

Keyword search is NOT the answer to every problem Abstract/Aggregative queries: E.g. find news reports about visits by heads of state to other countries E.g. find reviews for movies that some person might like, given a couple of examples Enterprise Search: Private collections of documents, e.g. all the documents published by a corporation Much lower redundancy than web documents Need to search for concepts, not words e.g. looking for proposals with similar research goals: wording may be very different (different scientific disciplines, different emphasis, different methodology)

Even when keyword search is a good start Waaaaay too many documents that match the key words Solution: Filter data that is irrelevant (for task in hand) Some examples Different languages Spam vs. non-spam Forum/blog post topic How to solve the Spam/non-Spam problem? Suggestions?

Machine Learning could help Instead of writing/coding rules ( expert systems ), use statistical methods to learn rules that perform a classification task, e.g. given a blog post, which of N different topics is it most relevant to? Given an email, is it spam or not? Given a document, which of K different languages is it in? and now, a demonstration The demonstration shows what a well-designed classifier can achieve. Here s a very high-level view of how classifiers work in the context of NLP.

Motivating example: blog topics Blog crawl

What we need: f( ) = politics f( ) = sports f( ) = business

Where to get it: Machine Learning Data politics Feature Functions sports Learning Algorithm business

So, what are feature functions? Take same input as Indicate some property of the input a.k.a., a feature Typical NLP feature functions Binary Real Appearance of a given word Appearance of two words consecutively a.k.a., a bigram Appearance of a word with a given part of speech Appearance of a named entity (e.g. Barack Obama ) Counts of binary features TFIDF (a statistical measure of a document)

What does the Learning Algorithm do? Training input: a feature-based representation of examples, together with the labels we want to be able to predict E.g. 1,000 email feature sets labeled either spam or non-spam Each feature set is extracted from an email using the feature functions Labels are typically assigned by human annotator Computes statistics over features, relating features to labels Generates a classifier (statistical model) that predicts a label based on the features that are active. E.g. if feature word- VIAGRA -is-present is active, predict SPAM Training output: the classifier. Now it will take feature representations of new emails (no label!) and predict a label. Sometimes, it will be wrong!

Update Rules Decision rule: Linear Threshold Function i A t w t, i s i t activation threshold Winnow mistake driven update rules: Promotion: if, Demotion: if,

One Basic System: One vs. All Linear Threshold Function Targets (concepts) Weighted edges, instead of weight vectors Features Prediction is one vs. all Cognitive Computations Software Tutorial 8/25/05 12

A Training Example 3 Newsgroups Health, disease, review: 2, 1, 2, 1, 1, 2, 1 21 = 4 1, 2, 1, 2, 12 = 2 2, 2, 1, 12, 2 1 = 1 Health Computers Movies Computers, process, terminal, linux: Health, blood, terminal: Movies, blood, review: Movies, blood, exceed, gun:?, disease, exceed, terminal: disease process crime blood exceed review terminal linux gun Update rule: Winnow α = 2, β = ½, θ = 3.5

A Training Example, abstracted 1, 1001, 1006: 2, 1, 2, 1, 1, 2, 1 21 = 4 1, 2, 1, 2, 12 = 2 2, 2, 1, 12, 2 1 = 1 1 2 3 2, 1002, 1007, 1008: 1, 1004, 1007: 3, 1006, 1004: 3, 1004, 1005, 1009: 1001, 1005, 1007: 1001 1002 1003 1004 1005 1006 1007 1008 1009 Update rule: Winnow α = 2, β = ½, θ = 3.5

Adapting to New Task: Spam Filtering What if we want to learn Spam vs. Non-Spam? Demonstration showed that the same black box can be adapted to multiple problems so what happens internally? Feature functions are generic pattern extractors for a new set of documents, new features will be extracted E.g. for Spam, we d expect to see some features like word- VIAGRA -is-present New documents come with their own set of labels (again, assigned by human annotators) So we reuse the same code, but generate a new classifier

A Training Example Spam Filtering Spam, V1AGRA, medicine: 2, 1, 2, 1, 1, 2, 1 21 = 4 2, 2, 1, 12, 2 1 = 1 Spam Non-Spam Spam, buy, huge: Non-Spam, buy, medicine: Non-Spam, buy, time, avoid:?, V1AGRA, time, huge: V1AGRA balding benefit buy time medicine huge panic avoid Update rule: Winnow α = 2, β = ½, θ = 3.5

Some Analysis We defined a very generic feature set bag-of-words We did reasonably well on three different tasks Can we do better on each task? of course. If we add good feature functions, the learning algorithm will find more useful patterns. Suggestions for patterns for Spam filtering? Newsgroup classification? are the features we add for Spam filtering good for Newsgroups? When we add specialized features, are we cheating? In fact, a lot of time is usually spent engineering good features for individual tasks. It s one way to add domain knowledge.

A Caveat It s often a lot of work to learn to use a new tool set It can be tempting to think it would be easier to just implement what you need yourself Sometimes, you ll be right But probably not this time Learning a tool set is an investment: payoff comes later It s easy to add new functionality it may already be a method in some class in a library; if not, there s infrastructure to support it You will avoid certain errors: someone already made them and coded against them Probably, it s a lot more work than you think to DIY

Homework (!) To prepare for tomorrow s tutorial, you should: Log in to the DSSI server via SSH Check that you can transfer a file to your home directory from your laptop If you have any questions, ask Tim or Yuancheng: Tim: Yuancheng: weninge1@illinois.edu ytu@illinois.edu Bring your laptop to the tutorial!!!