Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval



Similar documents
Content-Based Recommendation

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Support Vector Machine (SVM)

LCs for Binary Classification

Support Vector Machine. Tutorial. (and Statistical Learning Theory)

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Machine Learning Final Project Spam Filtering

A fast multi-class SVM learning method for huge databases

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Support Vector Machines Explained

Active Learning SVM for Blogs recommendation

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Predict Influencers in the Social Network

Search and Information Retrieval

Big Data Analytics CSCI 4030

A Simple Introduction to Support Vector Machines

Machine Learning in Spam Filtering

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Lecture 3: Linear methods for classification

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r

How to Win at the Track

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

1 Topic. 2 Scilab. 2.1 What is Scilab?

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

CSE 473: Artificial Intelligence Autumn 2010

Classification algorithm in Data mining: An Overview

Statistical Machine Learning

Lecture 6: Logistic Regression

Investigation of Support Vector Machines for Classification

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Machine Learning Logistic Regression

Supervised Learning (Big Data Analytics)

Machine learning for algo trading

MACHINE LEARNING IN HIGH ENERGY PHYSICS

SVM Ensemble Model for Investment Prediction

Data Mining - Evaluation of Classifiers

Anti-Spam Filter Based on Naïve Bayes, SVM, and KNN model

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Car Insurance. Prvák, Tomi, Havri

Introduction to Online Learning Theory

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

Scalable Developments for Big Data Analytics in Remote Sensing

Classification of high resolution satellite images

Server Load Prediction

Machine Learning for Data Science (CS4786) Lecture 1

Statistical Models in Data Mining

SURVEY OF TEXT CLASSIFICATION ALGORITHMS FOR SPAM FILTERING

An Introduction to Data Mining

STA 4273H: Statistical Machine Learning

Making Sense of the Mayhem: Machine Learning and March Madness

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Recommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek

Support Vector Machines with Clustering for Training with Very Large Datasets

Machine Learning using MapReduce

: Introduction to Machine Learning Dr. Rita Osadchy

Search engine ranking

Knowledge Discovery from patents using KMX Text Analytics

Mining a Corpus of Job Ads

Classification Problems

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

Detecting Corporate Fraud: An Application of Machine Learning

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

A Rule-Based Short Query Intent Identification System

Data Mining Techniques for Prognosis in Pancreatic Cancer

Large-scale Data Mining: MapReduce and Beyond Part 2: Algorithms. Spiros Papadimitriou, IBM Research Jimeng Sun, IBM Research Rong Yan, Facebook

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

w ki w kj k=1 w2 ki k=1 w2 kj F. Aiolli - Sistemi Informativi 2007/2008

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j

Similarity Search in a Very Large Scale Using Hadoop and HBase

Removing Web Spam Links from Search Engine Results

Neural Networks and Support Vector Machines

Local features and matching. Image classification & object localization

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Lecture 2: The SVM classifier

Linear Threshold Units

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

Computational Advertising Andrei Broder Yahoo! Research. SCECR, May 30, 2009

WE DEFINE spam as an message that is unwanted basically

RANDOM PROJECTIONS FOR SEARCH AND MACHINE LEARNING

Comparing Support Vector Machines, Recurrent Networks and Finite State Transducers for Classifying Spoken Utterances

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre

Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

A User s Guide to Support Vector Machines

Multi-Class Active Learning for Image Classification

Multi-class Classification: A Coding Based Space Partitioning

Mathematical Models of Supervised Learning and their Application to Medical Diagnosis

Natural Language Processing. Today. Logistic Regression Models. Lecture 13 10/6/2015. Jim Martin. Multinomial Logistic Regression

Transcription:

Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy! Informational finding information about some topic which may be on one or more web pages Topical search! Navigational finding a particular web page that the user has either seen before or is assumed to exist! Transactional finding a site where a task such as shopping or downloading music can be performed [Broder, 2002] Web Search! For effective navigational and transactional search, need to combine features that reflect user relevance! Commercial web search engines combine evidence from hundreds of features to generate a ranking score for a web page page content, page metadata, anchor text, links (e.g., PageRank), and user behavior (click logs) page metadata e.g., age, how often it is updated, the URL of the page, the domain name of its site, and the amount of text content Search Engine Optimization! SEO: understanding the relative importance of features used in search and how they can be manipulated to obtain better search rankings for a web page E.g., improve the text used in the title tag, improve the text in heading tags, make sure that the domain name and URL contain important keywords, and try to improve the anchor text and link structure Some of these techniques are regarded as not appropriate by search engine companies

Web Search! In TREC evaluations, the most effective features for navigational search are: Text in the title, body, and heading (h1, h2, h3, and h4) parts of the document, the anchor text of all links pointing to the document, the PageRank number, and the inlink count! Given size of Web, many pages will contain all query terms Ranking algorithm focuses on discriminating between these pages Word proximity is important, e.g. n-gram models Machine Learning and IR! Considerable interaction between these fields Rocchio algorithm (60s) is a simple learning approach 80s, 90s: learning ranking algorithms based on user feedback 2000s: text categorization! Limited by amount of training data! Web query logs have generated new wave of research e.g., Learning to Rank Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Generative vs. Discriminative! All of the probabilistic retrieval models presented so far fall into the category of generative models A generative model assumes that documents were generated from some underlying model (in this case, usually a multinomial distribution) and uses training data to estimate the parameters of the model Probability of belonging to a class (i.e. the relevant documents for a query) is then estimated using Bayes Rule and the document model

Generative vs. Discriminative! A discriminative model estimates the probability of belonging to a class directly from the observed features of the document based on the training data! Generative models perform well with low numbers of training examples! Discriminative models usually have the advantage given enough training data Can also easily incorporate many features Discriminative Models for IR! Discriminative models can be trained using explicit relevance judgments or click data in query logs Click data is much cheaper, more noisy e.g. Ranking Support Vector Machine (SVM) takes as input partial rank information for queries» partial information about which documents should be ranked higher than others Example! Training data is r is partial rank information

! Training data is r is partial rank information» if document da should be ranked higher than db, then (da, db) ri partial rank information comes from relevance judgments (allows multiple levels of relevance) or click data» e.g., d1, d2 and d3 are the documents in the first, second and third rank of the search output, only d3 clicked on! (d3, d1) and (d3, d2) will be in desired ranking for this query! Learning a linear ranking function where w is a weight vector that is adjusted by learning d a is the vector representation of the features of document non-linear functions also possible! Weights represent importance of features learned using training data e.g.,! Learn w that satisfies as many of the following conditions as possible:! Can be formulated as an optimization problem!, known as a slack variable, allows for misclassification of difficult or noisy training examples, and C is a parameter that is used to prevent overfitting

! Software available to do optimization! Each pair of documents in our training data can be represented by the vector:! Score for this pair is:! SVM classifier will find a w that makes the smallest score as large as possible make the differences in scores as large as possible for the pairs of documents that are hardest to rank Support Vector Machines! Based on geometric principles! Given a set of inputs labeled and -, find the best hyperplane that separates the s and - s! Questions How is best defined? What if no hyperplane exists such that the s and - s can be perfectly separated? Separable vs. Non-Separable Data Best Hyperplane?! First, what is a hyperplane? A generalization of a line to higher dimensions Defined by a vector w! With SVMs, the best hyperplane is the one with the maximum margin! If x and x - are the closest and - inputs to the hyperplane, then the margin is: Separable Non-Separable

& Support Vector Machines w. x w. x Linear Separable Case! In math:! In English: Find the largest margin hyperplane that separates the s and - s Linearly Non-Separable Case! In math:! In English:! i denotes how misclassified instance i is Find a hyperplane that has a large margin and lowest misclassification cost Nearest Neighbor Classification

The Kernel Trick! Linearly non-separable data may become linearly separable if transformed, or mapped, to a higher dimension space! Computing vector math (i.e., dot products) in very high dimensional space is costly! The kernel trick allows very high dimensional dot products to be computed efficiently! Allows inputs to be implicitly mapped to high (possibly infinite) dimensional space with little computational overhead Non-Binary Classification with SVMs! One versus all Train class c vs. not class c SVM for every class If there are K classes, must train K classifiers Classify items according to:! One versus one Train a binary classifier for every pair of classes Must train K(K-1)/2 classifiers Computationally expensive for large values of K SVM Tools! Solving SVM optimization problem is not straightforward! Many good software packages exist SVM-Light LIBSVM R library Matlab SVM Toolbox