This lecture. Introduction to information retrieval. Making money with information retrieval. Some technical basics. Link analysis.

Size: px
Start display at page:

Download "This lecture. Introduction to information retrieval. Making money with information retrieval. Some technical basics. Link analysis."

Transcription

1

2 This lecture Introduction to information retrieval. Making money with information retrieval. Some technical basics. Link analysis. CSC401/2511 Spring

3 Information retrieval systems Information retrieval (IR): n. searching for documents or information in documents. Question-answering: respond with a specific answer to a question (e.g., Wolfram Alpha). Document retrieval: find documents relevant to a query, ranked by relevance (e.g., bing or Google). Text analytics/data mining: General organization of large textual databases (e.g., Lexis-Nexis, OpenText, MedSearch,.) CSC401/2511 Spring

4 Terminology Information retrieval has slightly different terminology than the tasks we ve seen previously: Document: a book, article, web page, or paragraph Collection: Term: Stop word: (depending on the task and data). a corpus of documents a word type a functional (non-content) word (e.g., the) CSC401/2511 Spring

5 Query types Different kinds of questions can be asked. Factoid questions, e.g., How often were the peace talks in Ireland delayed or disrupted as a result of acts of violence? Narrative (open-ended) questions, e.g., Can you tell me about contemporary interest in the Greek philosophy of stoicism? Complex/hybrid questions, e.g., Who was involved in the Schengen agreement to eliminate border controls in Western Europe and what did they hope to accomplish? CSC401/2511 Spring

6 Question answering (QA) Which woman has won more than 1 Nobel prize? (Marie Curie) Question Answering (QA) usually involves a specific answer to a question. CSC401/2511 Spring

7 Document retrieval vs IR One strategy is to turn question answering into information retrieval (IR) and let the human complete the task. CSC401/2511 Spring

8 Question answering (QA) CSC401/2511 Spring

9 Knowledge-based QA 1. Build a structured semantic representation of the query. Extract times, dates, locations, entities using regular expressions. Fit to well-known templates. CSC401/2511 Spring Query databases with these semantics. Ontologies (Wikipedia infoboxes). Restaurant review databases. Calendars. Movie schedules.

10 IR-based QA CSC401/2511 Spring

11 IR-based QA Information retrieval Question answering CSC401/2511 Spring

12 IBM s Watson Human 1 Game Control System Clue Grid Decisions to Buzz and Bet Strategy Watson s Game Controller Text-to-Speech Clue & Category Answers & Confidences Watson s QA Engine 2,880 IBM Power750 Compute Cores 15 TB of Memory Human 2 Clues, Scores & Other Game Data Content equivalent to ~ 1,000,000 books source: A Brief Overview and Thoughts for Healthcare Education and Performance Improvement by the IBM Watson team CSC401/2511 Spring

13 IBM s Watson: search This man became the 44 th President of the United States in 2008 CSC401/2511 Spring

14 IBM s Watson: search Title-oriented search: In some cases, the solution is in the title of highly-ranked documents. E.g., This pizza delivery boy celebrated New Year s at Applied Cryogenics. CSC401/2511 Spring

15 IBM s Watson: selection Once candidates have been gathered from various sources and methods, rank them according to various scores (IBM Watson uses >50 scoring metrics). In cell division, mitosis splits the nucleus & cytokinesis splits this liquid cushioning the nucleus CSC401/2511 Spring

16 IBM s Watson: selection One aspect of Jeopardy! is that answers are often posed with puns that have to be disambiguated. Bilbo shouldn t have played riddles in the dark with this shady character from WordNet s Synonym-sets CSC401/2511 Spring

17 How to make money out of this? CSC401/2511 Spring

18 Making money before search Advertisers used to pay for banner ads that did not depend on user queries. CPM (Cost per mille): Pay for each ad display. CPC (Cost per click): Pay when user clicks an ad. CTR (Click through rate): Fraction of ad displays that result in click-throughs. CPA (Cost per action): Pay only when user makes online purchase after click-through. CSC401/2511 Spring

19 Making money with search Advertisers now bid for keywords. Ads are displayed for the highest bidders when a query contains those keywords. PPC (Pay per click): CPC for ads served based on a ranking of bid keywords and user interest (e.g., Google AdWords). (it s a bit more complicated ) CSC401/2511 Spring

20 How are ads ranked? Today, a two-bid process is typical. First, organizations bid on keywords By itself, this can lead to abuse, monopolization, and irrelevant content. Second, we re-rank based on relevance based on click-through. CSC401/2511 Spring

21 How are ads ranked? Advertiser Bid CTR Ad rank Rank Paid A $ (minimum) B $ $2.68 C $ $1.51 D $ $0.51 Bid: amount determined by advertiser for keyword. CTR: click-through rate an approximation of relevance. Ad rank: Bid CTR trades off advertiser and user interests. Rank: actual rank. Paid: Minimum amount necessary to maintain rank + 1. CSC401/2511 Spring

22 How are ads ranked? Advertiser Bid CTR Ad rank Rank Paid A $ (minimum) B $ $2.68 C $ $1.51 D $ $0.51 Paid: Minimum amount necessary to maintain rank + 1. Paid r CTR r = Bid r+1 CTR r+1 + $0.01 Paid r = Bid r+1 CTR r+1 CTR r + $0.01 E.g., Paid 1 = $ $0.01 = $1.51 CSC401/2511 Spring

23 Aside highest paying search terms (according to $69.10 mesothelioma treatment options $66.46 mesothelioma risk $65.85 personal injury lawyer michigan $65.74 michigan personal injury attorney $62.59 student loans consolidation $61.44 car accident attorney los angeles $61.26 mesothelioma survival rate $60.96 treatment of mesothelioma $59.44 online car insurance quotes $59.39 arizona dui lawyer CSC401/2511 Spring

24 Back to basics. How do we find the right documents for a query? CSC401/2511 Spring

25 Queries A query is a textual key which orders a specific subset of documents (or answers) in a collection. Historically, these were highly structured in a logical language, but in modern search engines queries are more often streams of syntactically disconnected keywords. A boolean query is a logical combination of boolean membership predicates. Brutus AND Caesar AND NOT Calpurnia CSC401/2511 Spring

26 Term-document incidence Anthony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth ANTHONY BRUTUS CAESAR CALPURNIA CLEOPATRA MERCY WORSER For the query Brutus AND Caesar AND NOT Calpurnia, (Brutus) (Caesar) (Not Calpurnia) (Bitwise AND) CSC401/2511 Spring

27 Boolean Queries and big collections If we have 1 million documents, each with 1000 tokens 1 billion tokens at most 1 billion 1 s in the matrix. If we have 500,000 distinct terms, the term-document incidence matrix will have 500,000,000,000 elements. There will be << 1 billion 1s in this matrix. Very sparse and a waste of space. Can there be a better way? CSC401/2511 Spring

28 Inverted index Given a query word, the inverted index for that word gives us all documents that contain that word in either the title, the abstract (summary), some hidden metadata, or the entire text. More sophisticated versions also include the frequency and positions of the query word in each document. Matlab query Inverted index D 1 documents How does one construct such indices? CSC401/2511 Spring

29 Inverted index construction 1. Collect the documents to be indexed. Friends, Romans, countrymen So let it be with Caesar 2. Tokenize the text. Friends Romans countrymen So 3. Do preprocessing and normalization, resulting in the indexing terms. friend roman countryman so 4. Create a dictionary (hash) of documents given terms. CSC401/2511 Spring

30 Simple conjunctive query Given the query Brutus AND Calpurnia, 1. Locate Brutus in the dictionary. Retrieve documents list. 2. Locate Calpurnia in the dictionary. Retrieve documents list. 3. Intersect the two document lists. Return the result to the user. Linear in the lengths of document lists. (if lists are sorted) CSC401/2511 Spring

31 Constructing indices Spiders (aka. Robots, bots, crawlers) start with root (seed) URLs. Follow all links on these pages recursively. Novel pages are processed and indexed. Despite the exponential growth in memory across depth, breadth-first search is quite popular. Depth-first search is linear in depth, but can get lost. Trivia: If you click on the first contentful link in any Wikipedia page, you will eventually be led to the Philosophy article. CSC401/2511 Spring

32 Increasing entropy? Boolean retrieval is precise and was very popular for decades (it still is used for structured data, like desktop file search). The amount and value of unstructured data (i.e., text) has grown faster than structured data on the web Unstructured Structured Data volume Market cap (data from Chris Manning) Data volume Market cap CSC401/2511 Spring

33 Zipf s law on the web These variables have Zipfian distributions: Number of links to and links from a page. Length of web pages. Number of web page hits. (graph from Ray Mooney) CSC401/2511 Spring

34 New challenges for IR on the web Distributed data: Documents spread over millions of web servers. Volatile data: Document change or disappear frequently and rapidly. Large volume: Petabytes of data. Poor quality: No editorial control, false information, poor writing, typographic errours. Heterogeneity: Various media, languages, encodings. Unstructured: No uniform structure, HTML errors, CSC401/2511 Spring duplicate documents.

35 Detecting duplicates duplicates The user will become annoyed when many top-ranking hits are identical/similar. Nearly-identical pages can be determined by hashing E.g., don t index en.m.wikipedia.org/wiki/ if you ve indexed en.wikipedia.org/wiki/. Zero marginal relevance occurs when a highly relevant document becomes irrelevant by being ranked below a (near-)duplicate. CSC401/2511 Spring

36 Detecting duplicates duplicates Compute similarity with some edit-distance measure. Syntactic similarity (e.g., overlap of bigrams) easier to measure than semantic similarity. If this measure is above some threshold θ for some pair of documents, we consider them duplicates. Jaccard coefficient: J A, B = A B A B Is a measure of similarity on [0.. 1] J A, A = 1 J A, B = 0 iff A B = CSC401/2511 Spring

37 Jaccard coefficient on 2-grams Documents: d 1 : Jack London went to Toronto d 2 : Jack London went to the city of Toronto d 3 : Jack went from Toronto to London J d 1, d 2 = 3 8 = J d 1, d 3 = 0 CSC401/2511 Spring

38 Link analysis When we re crawling the web and indexing, we want to retain some record of similarity between (non-duplicate) documents in terms of their link structure. This will help in searching. CSC401/2511 Spring

39 Bibliometrics: citation analysis Impact factor: Developed in 1972 to measure the quality and influence of scientific journals. Measures how often articles are cited. Bibliographic coupling: Measure of similarity between documents according to the intersection of their citations (Kessler, 1963). A B CSC401/2511 Spring

40 Bibliometrics: citation analysis Co-citation: Measure of similarity between documents according to the intersection of the documents that cite them (Small, 1973). A B CSC401/2511 Spring

41 Links are not citations Many links are navigational within a website. Many pages with high in-degree are portals without much content. Some links are not necessarily endorsements. Relevance of citations in scientific settings is (theoretically) enforced by peer review. Can we mimic the enforcement of relevance usually done by human experts in scientific articles? CSC401/2511 Spring

42 Authorities and hubs Authorities are pages recognized as significant, trustworthy, and useful for a topic. In-degree (number of incoming links) is an estimate of authority. Should incoming links from authoritative pages count more than others? Hubs are index pages that provide lots of links to relevant content pages. e.g., reddit.com is a hub page for recycled memes. CSC401/2511 Spring

43 HITS The HITS algorithm (Kleinberg, 1998) attempts to learn hubs and authorities on a given topic given relevant web subgraphs. Hubs and authorities tend to form bipartite graphs. Hubs Authorities CSC401/2511 Spring

44 HITS First, find (top N) most relevant pages for a query this is the root set, R. (we ll see how to do this next lecture) Next, look at the link structure relative to R. The base set, S is R and all pages that link to and are linked from pages in R S R CSC401/2511 Spring

45 HITS: Authorities and In-degree Even for S, nodes with high in-degree may not be authorities they may just be generically popular pages. Authority should be determined by strong hubs. Iteratively (slowly) converge on a mutually reinforcing set of hubs and authorities. For every page p S, maintain Authority score: a p (initialized to 1/ S ) Hub score: h p (initialized to 1/ S ) subject to p S a 2 p = 1 = 2 p S h p CSC401/2511 Spring

46 HITS update rules Authorities p are pointed to ( ) by lots of good hubs q: a p = q:q p h q a 4 = h 1 + h 2 + h 3 Hubs point to lots of good authorities: h q = a p p:q p h 4 = a 1 + a 2 + a 3 CSC401/2511 Spring

47 Page similarity using HITS Given honda.com, we also get: toyota.com ford.com bmwusa.com saturn.com nissanmotors.com This method can have trouble with ambiguous queries, however CSC401/2511 Spring

48 PageRank PageRank (Brin & Page, 1998) is an alternative to HITS that does not distinguish between hub and authority. CSC401/2511 Spring

49 PageRank initial idea Assume that in-degree does not account for the authority of the source of a link. For page p, the page rank is: where R p = c CSC401/2511 Spring q:q p R(q) N q N q is the total number of out-links over all q. c is a normalizing constant. A page s rank flows out equally among outgoing links.

50 PageRank flow of authority PageRank would iteratively adjust all R p until overall page ranking converged Steady state CSC401/2511 Spring

51 PageRank problem Groups of purely self-referential pages (linked from the outside) are sinks that absorb all the rank in the system during the iterative rank assignment process. CSC401/2511 Spring

52 PageRank rank source An ethereal rank source E continually replenishes the rank of each page p by a fixed amount E p R p = c q:q p R(q) N q + E(p) CSC401/2511 Spring

53 Complete ranking A complete ranking involves combining: PageRank. Preferences using HTML tags (e.g., title or abstract are often highly informative). Similarity of query words and documents. How do we relate query words and documents in the first place? CSC401/2511 Spring

54 Next lecture How to relate query terms and documents. Vector space model. How to generalize query terms. Latent semantic indexing. How to rank documents. Singular value decomposition. How to evaluate different search engines. CSC401/2511 Spring

55 Misc Some slide and material based on those of Ray J. Mooney (UTexas, CS371R), Hinrich Schütze, Christina Lioma, and Chris Manning (Stanford, CS276). Dan Jurafsky (Stanford, CS124) CSC401/2511 Spring

56 Aside PageRank algorithm Given the total set of pages S, Let p S: E p = α for some 0 α 1 S Initialize p S: R p = 1/ S Until convergence: For each p S: R R q p 1 α + E(p) q:q p N q 1 c p S R p For each p S: R p cr (p) //normalize CSC401/2511 Spring

Lecture 1: Introduction and the Boolean Model

Lecture 1: Introduction and the Boolean Model Lecture 1: Introduction and the Boolean Model Information Retrieval Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group Simone.Teufel@cl.cam.ac.uk 1 Overview

More information

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 17/25: Web Search Basics Paul Ginsparg Cornell University, Ithaca, NY 2 Nov

More information

Web Search Engines. Search Engine Characteristics. Web Search Queries. Chapter 27, Part C Based on Larson and Hearst s slides at UC-Berkeley

Web Search Engines. Search Engine Characteristics. Web Search Queries. Chapter 27, Part C Based on Larson and Hearst s slides at UC-Berkeley Web Search Engines Chapter 27, Part C Based on Larson and Hearst s slides at UC-Berkeley http://www.sims.berkeley.edu/courses/is202/f00/ Database Management Systems, R. Ramakrishnan 1 Search Engine Characteristics

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval Lecture 8 Web Search 1 Overview ❶ Big picture ❷ Ads 2 Web search overview 3 Search is the top activity on the web 4 Without search engines, the web wouldn t work Without

More information

Practical Graph Mining with R. 5. Link Analysis

Practical Graph Mining with R. 5. Link Analysis Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities

More information

Web Search Engines: Solutions

Web Search Engines: Solutions Web Search Engines: Solutions Problem 1: A. How can the owner of a web site design a spider trap? Answer: He can set up his web server so that, whenever a client requests a URL in a particular directory

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Introduction to Information Retrieval http://informationretrieval.org

Introduction to Information Retrieval http://informationretrieval.org Introduction to Information Retrieval http://informationretrieval.org IIR 6&7: Vector Space Model Hinrich Schütze Institute for Natural Language Processing, University of Stuttgart 2011-08-29 Schütze:

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Computational Advertising Andrei Broder Yahoo! Research. SCECR, May 30, 2009

Computational Advertising Andrei Broder Yahoo! Research. SCECR, May 30, 2009 Computational Advertising Andrei Broder Yahoo! Research SCECR, May 30, 2009 Disclaimers This talk presents the opinions of the author. It does not necessarily reflect the views of Yahoo! Inc or any other

More information

Recommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1

Recommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Recommender Systems Seminar Topic : Application Tung Do 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Agenda Google news personalization : Scalable Online Collaborative Filtering Algorithm, System Components

More information

Web Advertising 1 2/26/2013 CS190: Web Science and Technology, 2010

Web Advertising 1 2/26/2013 CS190: Web Science and Technology, 2010 Web Advertising 12/26/2013 CS190: Web Science and Technology, 2010 Today's Plan Logistics Understanding searchers (Commercial Perspective) Search Advertising Next project: Google advertising challenge

More information

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION

Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND II. PROBLEM AND SOLUTION Brian Lao - bjlao Karthik Jagadeesh - kjag Legal Informatics Final Paper Submission Creating a Legal-Focused Search Engine I. BACKGROUND There is a large need for improved access to legal help. For example,

More information

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content

More information

DIGITAL MARKETING BASICS: SEO

DIGITAL MARKETING BASICS: SEO DIGITAL MARKETING BASICS: SEO Search engine optimization (SEO) refers to the process of increasing website visibility or ranking visibility in a search engine's "organic" or unpaid search results. As an

More information

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction Chapter-1 : Introduction 1 CHAPTER - 1 Introduction This thesis presents design of a new Model of the Meta-Search Engine for getting optimized search results. The focus is on new dimension of internet

More information

Technical challenges in web advertising

Technical challenges in web advertising Technical challenges in web advertising Andrei Broder Yahoo! Research 1 Disclaimer This talk presents the opinions of the author. It does not necessarily reflect the views of Yahoo! Inc. 2 Advertising

More information

How to Use Google AdWords

How to Use Google AdWords Web News Apps Videos Images More Search Tools How to Use Google AdWords A Beginner s Guide to PPC Advertising How to Use Google AdWords offers.hubspot.com/google-adwords-ppc Learn how to use Google AdWords

More information

Corso di Biblioteche Digitali

Corso di Biblioteche Digitali Corso di Biblioteche Digitali Vittore Casarosa casarosa@isti.cnr.it tel. 050-315 3115 cell. 348-397 2168 Ricevimento dopo la lezione o per appuntamento Valutazione finale 70-75% esame orale 25-30% progetto

More information

1 o Semestre 2007/2008

1 o Semestre 2007/2008 Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction

More information

Chapter 6. Attracting Buyers with Search, Semantic, and Recommendation Technology

Chapter 6. Attracting Buyers with Search, Semantic, and Recommendation Technology Attracting Buyers with Search, Semantic, and Recommendation Technology Learning Objectives Using Search Technology for Business Success Organic Search and Search Engine Optimization Recommendation Engines

More information

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles are freely available online:http://www.ijoer.

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles are freely available online:http://www.ijoer. RESEARCH ARTICLE SURVEY ON PAGERANK ALGORITHMS USING WEB-LINK STRUCTURE SOWMYA.M 1, V.S.SREELAXMI 2, MUNESHWARA M.S 3, ANIL G.N 4 Department of CSE, BMS Institute of Technology, Avalahalli, Yelahanka,

More information

GOOGLE ANALYTICS TERMS

GOOGLE ANALYTICS TERMS GOOGLE ANALYTICS TERMS BOUNCE RATE The average percentage of people who visited your website and only viewed one page. In Google Analytics, you are able to see a site-wide bounce rate and bounce rates

More information

So what is this session all about?

So what is this session all about? 1 So what is this session all about? In this session we will be looking to understand the key aspects of the digital marketing mix with specific emphasis on digital communications techniques. This session

More information

Online Marketing Optimization Essentials

Online Marketing Optimization Essentials Online Marketing Optimization Essentials Bilal Saleh Principal Partner E-Nor Inc. May 20, 2014 Agenda 2 E-Nor Overview Search Engine Optimization (SEO) Paid search Web Analytics Q&A Graphics by: http://www.iconarchive.com/show/seo-icons-by-designbolts.html

More information

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014 Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)

More information

Tamil Search Engine. Abstract

Tamil Search Engine. Abstract Tamil Search Engine Baskaran Sankaran AU-KBC Research Centre, MIT campus of Anna University, Chromepet, Chennai - 600 044. India. E-mail: baskaran@au-kbc.org Abstract The Internet marks the era of Information

More information

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,

More information

DIGITAL MARKETING BASICS: PPC

DIGITAL MARKETING BASICS: PPC DIGITAL MARKETING BASICS: PPC Search Engine Marketing (SEM) is an umbrella term referring to all activities that generate visibility in search engine result pages (SERPs) through the use of paid placement,

More information

The ABCs of AdWords. The 49 PPC Terms You Need to Know to Be Successful. A publication of WordStream & Hanapin Marketing

The ABCs of AdWords. The 49 PPC Terms You Need to Know to Be Successful. A publication of WordStream & Hanapin Marketing The ABCs of AdWords The 49 PPC Terms You Need to Know to Be Successful A publication of WordStream & Hanapin Marketing The ABCs of AdWords The 49 PPC Terms You Need to Know to Be Successful Many individuals

More information

Search Engine Optimization. Software Engineering October 5, 2011 Frank Takes (ftakes@liacs.nl) LIACS, Leiden University

Search Engine Optimization. Software Engineering October 5, 2011 Frank Takes (ftakes@liacs.nl) LIACS, Leiden University Search Engine Optimization Software Engineering October 5, 2011 Frank Takes (ftakes@liacs.nl) LIACS, Leiden University Overview Search Engines Search Engine Optimization Google PageRank Social Media Search

More information

Search Engine Optimisation (SEO) Factsheet

Search Engine Optimisation (SEO) Factsheet Search Engine Optimisation (SEO) Factsheet SEO is a complex element of our industry and many clients do not fully understand what is involved in getting their site ranked on common search engines such

More information

Semantic Search in Portals using Ontologies

Semantic Search in Portals using Ontologies Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br

More information

2015 SEO AND Beyond. Enter the Search Engines for Business. www.thinkbigengine.com

2015 SEO AND Beyond. Enter the Search Engines for Business. www.thinkbigengine.com 2015 SEO AND Beyond Enter the Search Engines for Business www.thinkbigengine.com Including SEO Into Your 2015 Marketing Campaign SEO in 2015 is tremendously different than it was just a few years ago.

More information

The PageRank Citation Ranking: Bring Order to the Web

The PageRank Citation Ranking: Bring Order to the Web The PageRank Citation Ranking: Bring Order to the Web presented by: Xiaoxi Pang 25.Nov 2010 1 / 20 Outline Introduction A ranking for every page on the Web Implementation Convergence Properties Personalized

More information

Search Engine Optimization - From Automatic Repetitive Steps To Subtle Site Development

Search Engine Optimization - From Automatic Repetitive Steps To Subtle Site Development Narkevičius. Search engine optimization. 3 Search Engine Optimization - From Automatic Repetitive Steps To Subtle Site Development Robertas Narkevičius a Vilnius Business College, Kalvariju street 125,

More information

Top Online Activities (Jupiter Communications, 2000) CS276A Text Information Retrieval, Mining, and Exploitation

Top Online Activities (Jupiter Communications, 2000) CS276A Text Information Retrieval, Mining, and Exploitation Top Online Activities (Jupiter Communications, 2000) CS276A Text Information Retrieval, Mining, and Exploitation Lecture 11 12 November, 2002 Email Web Search 88% 96% Special thanks to Andrei Broder, IBM

More information

Introduction to Information Retrieval http://informationretrieval.org

Introduction to Information Retrieval http://informationretrieval.org Introduction to Information Retrieval http://informationretrieval.org IIR 7: Scores in a Complete Search System Hinrich Schütze Center for Information and Language Processing, University of Munich 2014-05-07

More information

Part 1: Link Analysis & Page Rank

Part 1: Link Analysis & Page Rank Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Exam on the 5th of February, 216, 14. to 16. If you wish to attend, please

More information

A COMPREHENSIVE REVIEW ON SEARCH ENGINE OPTIMIZATION

A COMPREHENSIVE REVIEW ON SEARCH ENGINE OPTIMIZATION Volume 4, No. 1, January 2013 Journal of Global Research in Computer Science REVIEW ARTICLE Available Online at www.jgrcs.info A COMPREHENSIVE REVIEW ON SEARCH ENGINE OPTIMIZATION 1 Er.Tanveer Singh, 2

More information

Search Engines. Stephen Shaw 18th of February, 2014. Netsoc

Search Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc Search Engines Stephen Shaw Netsoc 18th of February, 2014 Me M.Sc. Artificial Intelligence, University of Edinburgh Would recommend B.A. (Mod.) Computer Science, Linguistics, French,

More information

Towards a metric for on-page search engine optimization

Towards a metric for on-page search engine optimization Central Page 194 of 344 Towards a metric for on-page search engine optimization Goran Matošević Faculty of Economics and Tourism Dr. Mijo Mirković University of Pula Preradovićeva 1/1, 52100 Pula, Croatia

More information

TF-IDF. David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt

TF-IDF. David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt TF-IDF David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture6-tfidf.ppt Administrative Homework 3 available soon Assignment 2 available soon Popular media article

More information

Subordinating to the Majority: Factoid Question Answering over CQA Sites

Subordinating to the Majority: Factoid Question Answering over CQA Sites Journal of Computational Information Systems 9: 16 (2013) 6409 6416 Available at http://www.jofcis.com Subordinating to the Majority: Factoid Question Answering over CQA Sites Xin LIAN, Xiaojie YUAN, Haiwei

More information

Watson. An analytical computing system that specializes in natural human language and provides specific answers to complex questions at rapid speeds

Watson. An analytical computing system that specializes in natural human language and provides specific answers to complex questions at rapid speeds Watson An analytical computing system that specializes in natural human language and provides specific answers to complex questions at rapid speeds I.B.M. OHJ-2556 Artificial Intelligence Guest lecturing

More information

Google Instant: Potential Impact on SEM and SEO

Google Instant: Potential Impact on SEM and SEO Google Instant: Potential Impact on SEM and SEO September 2010 Google Instant: Potential Impact on SEM and SEO Overview Google has unveiled a new search feature, Google Instant, which promises to provide

More information

The 8 Key Metrics That Define Your AdWords Performance. A WordStream Guide

The 8 Key Metrics That Define Your AdWords Performance. A WordStream Guide The 8 Key Metrics That Define Your AdWords Performance A WordStream Guide The 8 Key Metrics That Define Your Adwords Performance WordStream Customer Success As anyone who has ever managed a Google AdWords

More information

A SIMPLE GUIDE TO PAID SEARCH (PPC)

A SIMPLE GUIDE TO PAID SEARCH (PPC) A SIMPLE GUIDE TO PAID SEARCH (PPC) A jargon-busting introduction to how paid search can help you to achieve your business goals ebook 1 Contents 1 // What is paid search? 03 2 // Business goals 05 3 //

More information

Search Engine Marketing (SEM) with Google Adwords

Search Engine Marketing (SEM) with Google Adwords Search Engine Marketing (SEM) with Google Adwords Account Setup A thorough account setup will ensure that your search engine marketing efforts are on a solid framework. This ensures the campaigns, ad groups

More information

12/6/12. Online Advertising. Online advertising $ Administrative. David Kauchak cs458 Fall 2012. Papers due tomorrow

12/6/12. Online Advertising. Online advertising $ Administrative. David Kauchak cs458 Fall 2012. Papers due tomorrow Online Advertising David Kauchak cs458 Fall 2012 http://www.xkcd.com/208/ Administrative Online advertising $ Papers due tomorrow Review assignments out Saturday morning l Review due Sunday Project presentations

More information

SEO 360: The Essentials of Search Engine Optimization INTRODUCTION CONTENTS. By Chris Adams, Director of Online Marketing & Research

SEO 360: The Essentials of Search Engine Optimization INTRODUCTION CONTENTS. By Chris Adams, Director of Online Marketing & Research SEO 360: The Essentials of Search Engine Optimization By Chris Adams, Director of Online Marketing & Research INTRODUCTION Effective Search Engine Optimization is not a highly technical or complex task,

More information

Small Business SEO Marketing an introduction

Small Business SEO Marketing an introduction Small Business SEO Marketing an introduction Optimax May 2012 www.optimaxads.com 1 CONTENTS Introduction 3 On Page Optimisation 3 How Google views your Web Site 5 How to check your web page code for SEO

More information

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web

More information

CSCI 5417 Information Retrieval Systems Jim Martin!

CSCI 5417 Information Retrieval Systems Jim Martin! CSCI 5417 Information Retrieval Systems Jim Martin! Lecture 9 9/20/2011 Today 9/20 Where we are MapReduce/Hadoop Probabilistic IR Language models LM for ad hoc retrieval 1 Where we are... Basics of ad

More information

A Study on Competent Crawling Algorithm (CCA) for Web Search to Enhance Efficiency of Information Retrieval

A Study on Competent Crawling Algorithm (CCA) for Web Search to Enhance Efficiency of Information Retrieval A Study on Competent Crawling Algorithm (CCA) for Web Search to Enhance Efficiency of Information Retrieval S. Saranya, B.S.E. Zoraida and P. Victor Paul Abstract Today s Web is very huge and evolving

More information

Proposal for Search Engine Optimization. Ref: Pro-SEO-0049/2009

Proposal for Search Engine Optimization. Ref: Pro-SEO-0049/2009 Proposal for Search Engine Optimization Ref: Pro-SEO-0049/2009 CONTENTS Contents... 2 Executive Summary... 3 Overview... 4 1.1 How Search Engines WORK?... 4 1.2 About us... 6 Methodology... 7 1.2.1 Phase

More information

EECS 395/495 Lecture 3 Scalable Indexing, Searching, and Crawling

EECS 395/495 Lecture 3 Scalable Indexing, Searching, and Crawling EECS 395/495 Lecture 3 Scalable Indexing, Searching, and Crawling Doug Downey Based partially on slides by Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze Announcements Project proposals due

More information

Crawling (spidering): finding and downloading web pages automatically. Web crawler (spider): a program that downloads pages

Crawling (spidering): finding and downloading web pages automatically. Web crawler (spider): a program that downloads pages Web Crawling Crawling and Crawler Crawling (spidering): finding and downloading web pages automatically Web crawler (spider): a program that downloads pages Challenges in crawling Scale: tens of billions

More information

Online terminologie 1. % Exit The percentage of users who exit from a page. Active Time / Engagement Time

Online terminologie 1. % Exit The percentage of users who exit from a page. Active Time / Engagement Time Online terminologie 1 Online terminologie Terminology Explanation % Exit The percentage of users who exit from a page. Active Time / Engagement Time Affiliate Marketing Aggregator AJAX Alt Tag Anchor Tag

More information

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS. PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

Search engine ranking

Search engine ranking Proceedings of the 7 th International Conference on Applied Informatics Eger, Hungary, January 28 31, 2007. Vol. 2. pp. 417 422. Search engine ranking Mária Princz Faculty of Technical Engineering, University

More information

Measuring the Utilization of On-Page Search Engine Optimization in Selected Domain

Measuring the Utilization of On-Page Search Engine Optimization in Selected Domain JIOS, VOL. 39, NO. 2 (2015) SUBMITTED 07/15; ACCEPTED 10/15 UDC 001.81:004.774 Original Scientific Paper Measuring the Utilization of On-Page Search Engine Optimization in Selected Domain Goran Matošević

More information

SEARCH ENGINE OPTIMISATION

SEARCH ENGINE OPTIMISATION S E A R C H E N G I N E O P T I M I S AT I O N - PA G E 2 SEARCH ENGINE OPTIMISATION Search Engine Optimisation (SEO) is absolutely essential for small to medium sized business owners who are serious about

More information

Mission: To Help Digital Marketers Succeed Online.

Mission: To Help Digital Marketers Succeed Online. WELCOME ABOUT NETELIXIR Mission: To Help Digital Marketers Succeed Online. Incorporated: 2005. Global Offices: Princeton (HQ). London. Hyderabad. Team: 75+ fanatically analytical search marketers with

More information

Dynamical Clustering of Personalized Web Search Results

Dynamical Clustering of Personalized Web Search Results Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC xshen@cs.uiuc.edu Hong Cheng CS Dept, UIUC hcheng3@uiuc.edu Abstract Most current search engines present the user a ranked

More information

SEO Definition. SEM Definition

SEO Definition. SEM Definition SEO Definition Search engine optimization (SEO) is the process of improving the volume and quality of traffic to a web site from search engines via "natural" ("organic" or "algorithmic") search results.

More information

Search Engine Optimization (SEO): Improving Website Ranking

Search Engine Optimization (SEO): Improving Website Ranking Search Engine Optimization (SEO): Improving Website Ranking Chandrani Nath #1, Dr. Laxmi Ahuja *2 # 1 *2 Amity University, Noida Abstract: - As web popularity increases day by day, millions of people use

More information

types of information systems computer-based information systems

types of information systems computer-based information systems topics: what is information systems? what is information? knowledge representation information retrieval cis20.2 design and implementation of software applications II spring 2008 session # II.1 information

More information

Index Terms Domain name, Firewall, Packet, Phishing, URL.

Index Terms Domain name, Firewall, Packet, Phishing, URL. BDD for Implementation of Packet Filter Firewall and Detecting Phishing Websites Naresh Shende Vidyalankar Institute of Technology Prof. S. K. Shinde Lokmanya Tilak College of Engineering Abstract Packet

More information

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. White Paper Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. Using LSI for Implementing Document Management Systems By Mike Harrison, Director,

More information

Intelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives

Intelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives Intelligent Search for Answering Clinical Questions Coronado Group, Ltd. Innovation Initiatives Search The Way You Think Copyright 2009 Coronado, Ltd. All rights reserved. All other product names and logos

More information

An Approach to Give First Rank for Website and Webpage Through SEO

An Approach to Give First Rank for Website and Webpage Through SEO International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-2 Issue-6 E-ISSN: 2347-2693 An Approach to Give First Rank for Website and Webpage Through SEO Rajneesh Shrivastva

More information

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015 W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction

More information

Analysis of Web Archives. Vinay Goel Senior Data Engineer

Analysis of Web Archives. Vinay Goel Senior Data Engineer Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner

More information

SEO AND CONTENT MANAGEMENT SYSTEM

SEO AND CONTENT MANAGEMENT SYSTEM International Journal of Electronics and Computer Science Engineering 953 Available Online at www.ijecse.org ISSN- 2277-1956 SEO AND CONTENT MANAGEMENT SYSTEM Savan K. Patel 1, Jigna B.Prajapati 2, Ravi.S.Patel

More information

Five essentials of SEO for every restaurant website

Five essentials of SEO for every restaurant website Five essentials of SEO for every restaurant website PURPOSE This document will provide restaurant managers and restaurateurs with a basic understanding of Search Engine Optimisation (SEO) and its importance

More information

» A Hardware & Software Overview. Eli M. Dow

» A Hardware & Software Overview. Eli M. Dow <emdow@us.ibm.com:> » A Hardware & Software Overview Eli M. Dow Overview:» Hardware» Software» Questions 2011 IBM Corporation Early implementations of Watson ran on a single processor where it took 2 hours

More information

Search Engine Optimization: The Basics. Presented by Craig Chevrier

Search Engine Optimization: The Basics. Presented by Craig Chevrier Search Engine Optimization: The Basics Presented by Craig Chevrier Search Engine Optimization (SEO) Just because you build it, doesn t mean they ll come SEO = Search Engine Optimization This is just one

More information

Distributed Computing and Big Data: Hadoop and MapReduce

Distributed Computing and Big Data: Hadoop and MapReduce Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Mindshare Studios Introductory Guide to Search Engine Optimization

Mindshare Studios Introductory Guide to Search Engine Optimization Mindshare Studios Introductory Guide to Search Engine Optimization An introduction to search engine inclusion & online marketing. What is Search Engine Optimization? Search engine optimization (SEO) is

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

11/23/2011. PPC Search Advertising. There are Two Key Parts to any Search Engine Marketing Strategy. 1. Search Engine Optimisation (SEO)

11/23/2011. PPC Search Advertising. There are Two Key Parts to any Search Engine Marketing Strategy. 1. Search Engine Optimisation (SEO) PPC Search Advertising Adrian Feane Effective PPC Campaigns and 5 Case Study & Summary Slide There are Two Key Parts to any Search Engine Marketing Strategy. Search Engine Optimisation (SEO). Pay Per Click

More information

Online Traffic Generation

Online Traffic Generation Online Traffic Generation Executive Summary Build it and they will come. A great quote from a great movie, but not necessarily true in the World Wide Web. Build it and drive traffic to your Field of Dreams

More information

Outline. for Making Online Advertising Decisions. The first banner ad in 1994. Online Advertising. Online Advertising.

Outline. for Making Online Advertising Decisions. The first banner ad in 1994. Online Advertising. Online Advertising. Modeling Consumer Search for Making Online Advertising Decisions i Alan Montgomery Associate Professor Carnegie Mellon University Tepper School of Business Online Advertising Background Search Advertising

More information

Mining Web Informative Structures and Contents Based on Entropy Analysis

Mining Web Informative Structures and Contents Based on Entropy Analysis IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 16, NO. 1, JANUARY 2004 1 Mining Web Informative Structures and Contents Based on Entropy Analysis Hung-Yu Kao, Shian-Hua Lin, Member, IEEE Computer

More information

Context Aware Predictive Analytics: Motivation, Potential, Challenges

Context Aware Predictive Analytics: Motivation, Potential, Challenges Context Aware Predictive Analytics: Motivation, Potential, Challenges Mykola Pechenizkiy Seminar 31 October 2011 University of Bournemouth, England http://www.win.tue.nl/~mpechen/projects/capa Outline

More information

Video Ad Exam. 01 Types of Video Ads. 02 Using Video Ads. 03 Making a Video. 04 Creating a Campaign. 05 Campaign Settings. 06 Companion Banner

Video Ad Exam. 01 Types of Video Ads. 02 Using Video Ads. 03 Making a Video. 04 Creating a Campaign. 05 Campaign Settings. 06 Companion Banner Video Ad Exam 01 Types of Video Ads 02 Using Video Ads 03 Making a Video 04 Creating a Campaign 05 Campaign Settings 06 Companion Banner 07 Mastheads 08 Optimized Rotations 09 Call to Action Overlay 10

More information

Large-Scale Test Mining

Large-Scale Test Mining Large-Scale Test Mining SIAM Conference on Data Mining Text Mining 2010 Alan Ratner Northrop Grumman Information Systems NORTHROP GRUMMAN PRIVATE / PROPRIETARY LEVEL I Aim Identify topic and language/script/coding

More information

Project 2: Term Clouds (HOF) Implementation Report. Members: Nicole Sparks (project leader), Charlie Greenbacker

Project 2: Term Clouds (HOF) Implementation Report. Members: Nicole Sparks (project leader), Charlie Greenbacker CS-889 Spring 2011 Project 2: Term Clouds (HOF) Implementation Report Members: Nicole Sparks (project leader), Charlie Greenbacker Abstract: This report describes the methods used in our implementation

More information

Knowledge Discovery and Data Mining 1 (VO) (707.003)

Knowledge Discovery and Data Mining 1 (VO) (707.003) Knowledge Discovery and Data Mining 1 (VO) (707.003) Denis Helic KTI, TU Graz Oct 1, 2015 Denis Helic (KTI, TU Graz) KDDM1 Oct 1, 2015 1 / 55 Lecturer Name: Denis Helic Office: IWT, Inffeldgasse 13, 5th

More information

Digital Training Search Engine Optimization. Presented by: Aris Tianto Head of Search at InboundID aris@inboundid.com @atianto

Digital Training Search Engine Optimization. Presented by: Aris Tianto Head of Search at InboundID aris@inboundid.com @atianto Digital Training Search Engine Optimization Presented by: Aris Tianto Head of Search at InboundID aris@inboundid.com @atianto Why Is Search Important Why search is important? Total Internet users in Indonesia

More information

Hadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com)

Hadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com) Hadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com) About Me Parallel Programming since 1989 High-Performance Scientific Computing 1989-2005, Data-Intensive Computing 2005 -... Hadoop Solutions

More information

Search Engine Optimisation Guide May 2009

Search Engine Optimisation Guide May 2009 Search Engine Optimisation Guide May 2009-1 - The Basics SEO is the active practice of optimising a web site by improving internal and external aspects in order to increase the traffic the site receives

More information

The Definitive Guide to Google AdWords

The Definitive Guide to Google AdWords The Definitive Guide to Google AdWords Create Versatile and Powerful Marketing and Advertising Campaigns a ii a Bart Weller Lori Calcott Apress* Contents y About the Author About the Technical Reviewer

More information

CSE 7/5337: Information Retrieval and Web Search Web Search (IIR 19)

CSE 7/5337: Information Retrieval and Web Search Web Search (IIR 19) CSE 7/5337: Information Retrieval and Web Search Web Search (IIR 19) Michael Hahsler Southern Methodist University These slides are largely based on the slides by Hinrich Schütze Institute for Natural

More information

10. Search Engine Marketing

10. Search Engine Marketing 10. Search Engine Marketing What s inside: We look at the difference between paid and organic search results and look through the key terms and concepts that will help you understand this relationship.

More information