Web Usage in Client-Server Design

Similar documents
[Ramit Solutions] SEO SMO- SEM - PPC. [Internet / Online Marketing Concepts] SEO Training Concepts SEO TEAM Ramit Solutions

Online edition (c)2009 Cambridge UP

Digital media glossary

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Online marketing. Summery. Introduction. marketing. Martin Hellgren,

INTERNET MARKETING. SEO Course Syllabus Modules includes: COURSE BROCHURE

Search Engine Optimization and Pay Per Click Building Your Online Success

Search Engine Optimization (SEO): Improving Website Ranking

Introduction to Search Engine Marketing

Chapter 6. Attracting Buyers with Search, Semantic, and Recommendation Technology

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction

Search Engine Marketing (SEM) with Google Adwords

Internet Advertising Glossary Internet Advertising Glossary

Our SEO services use only ethical search engine optimization techniques. We use only practices that turn out into lasting results in search engines.

Introduction to Web Technology. Content of the course. What is the Internet? Diana Inkpen

Increasing Traffic to Your Website Through Search Engine Optimization (SEO) Techniques

Master List of Products and Services

Best Practice Search Engine Optimisation

Search Engine Optimisation (SEO)

Website Audit Reports

DIGITAL MARKETING BASICS: SEO

Technical challenges in web advertising

T H E O F F I C I A L M A K E I T H A P P E N G U I D E T O. Paid advertising on the internet

DIGITAL MARKETING TRAINING

Full Website Analysis

Bitrix Site Manager 4.1. User Guide

Promoting your Site: Search Engine Optimisation and Web Analytics

Search Engine Optimization Glossary

DIGITAL MARKETING. The Page Title Meta Descriptions & Meta Keywords

Attracting Visitors to Your Web Site

Advance Diploma in Digital. Marketing. Full Time Part Time Online.

Campaign Goals, Objectives and Timeline SEO & Pay Per Click Process SEO Case Studies SEO & PPC Strategy On Page SEO Off Page SEO Pricing Plans Why Us

A COMPREHENSIVE REVIEW ON SEARCH ENGINE OPTIMIZATION

Digital Marketing VS Internet Marketing: A Detailed Study

GOOGLE ANALYTICS TERMS

Search engine marketing

Study Guide #2 for MKTG 469 Advertising Types of online advertising:

N-CAP Users Guide. Everything You Need to Know About Using the Internet! How Banner Ads Work

Search Engine Optimization (SEO) & Digital Marketing Services Details

SEARCH ENGINE OPTIMIZATION

MY DIGITAL PLAN MY DIGITAL PLAN BROCHURE

Challenges in Running a Commercial Web Search Engine. Amit Singhal

Session 9 Online acquisition media. Australian Direct Marketing Association

Analysis of Web Archives. Vinay Goel Senior Data Engineer

netsmartz Visibility on the Net I N T E R N E T M A R K E T I N G

CIBC Business Toolkit Grow and Manage Your Business Online. Part 5: Grow Online Worksheet

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

SEO Search Engine Optimization. ~ Certificate ~ For: By. and

Search Engine Optimization with Jahia

IC3 Internet and Computing Core Certification Guide

DIY SEO for Small Businesses

The Absolute Beginners Guide to SEO

WEB SITE DEVELOPMENT WORKSHEET

Table of contents. HTML5 Data Bindings SEO DMXzone

Search Engine Optimization

MMGD0204 Web Application Technologies. Chapter 1 Introduction to Internet

Internet Marketing Guide

100 SEO Tips. 1. Recognize the importance of web traffic.

SEO Search Engine Optimization. ~ Certificate ~ For: Q MAR WDH By

SOCIAL MEDIA OPTIMIZATION

Search Engine Optimisation Extras

Improving your website traffic

Bisecting K-Means for Clustering Web Log data

This term is also frequently used to describe the return of a piece of due to an error in the addressing or distribution process.

Chapter 9 The Internet

Successful Online Display Advertising

Pay-Per-Click/Google Adwords Services

Driving Online Traffic and Measuring Offline Phone Conversions

Search Engine Marketing Overview LOCAL SEM PROCESS

China Search International Introducing Baidu

Technology/Internet Usage Workshop

10. Search Engine Marketing

Administrator s Guide

AN EFFICIENT APPROACH TO PERFORM PRE-PROCESSING

Internet Marketing Basics

Online terminologie 1. % Exit The percentage of users who exit from a page. Active Time / Engagement Time

Search Engine Submission

Digital Marketing Training Boucher - W3training School

Search engine ranking

Web Design (One Credit), Beginning with School Year

Common Online Advertising Terms Provided by ZEDO, Inc.

RECOMMENDATIONS HOW TO ATTRACT CLIENTS TO ROBOFOREX

Challenge: Solution: 2009, WordStream Inc. All Rights Reserved.

Driving more business from your website

8 Simple Things You Might Be Overlooking In Your AdWords Account. A WordStream Guide

Search Engine Optimization

SEO and Google Analytics. Tips for getting the most out of your website

SEO Hats. On-Page Optimisation. Internet Technical Terms. Website Architecture. How the Search Engine works. Search Engine Parameters

The Definitive Guide to. Video SEO. i5 web works Phone: Web:

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Adwords 100 Success Secrets. Google Adwords Secrets revealed, How to get the Most Sales Online, Increase Sales, Lower CPA and Save Time and Money

SEARCH ENGINE MARKETING 101. A Beginners Guide to Search Engine Marketing

Title/Description/Keywords & Various Other Meta Tags Development

SEO Analysis Guide CreatorSEO easy to use SEO tools

SEO Definition. SEM Definition

SEO Glossary A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-R-S-T-U-V-W-X-Y

How to make the most of search engine marketing (SEM)

TECHNIQUES USED IN SEARCH ENGINE MARKETING

Web Mining using Artificial Ant Colonies : A Survey

Avi N. Bibi (800) ext.235

Transcription:

Web Search

Web Usage in Client-Server Design A client (e.g., a browser) communicates with a server via http Hypertext transfer protocol: a lightweight and simple protocol asynchronously carrying a variety of payloads (text, images, audio, and video) A client sends an http request to a web server by specifying a URL (universal resource locator) Web pages are encoded in HTML (hypertext markup language) A browser can ignore what it does not understand A browser gets as much as it can, and does not crash due to incompatible features Publishing becomes unprecedentedly easy J. Pei: Information Retrieval and Web Search -- Web Search Basics 2

Making Web Info Discoverable Full-text index search engines Altavista, Excite, and Inforseek Use keyword search interfaces supported by inverted indexes and ranking mechanisms Taxonomies populated with web pages in categories Yahoo! Allowing users to browse through a hierarchical tree of category labels A convenient and intuitive way at the beginning J. Pei: Information Retrieval and Web Search -- Web Search Basics 3

Drawbacks of Taxonomy Methods Accurately classifying web pages into taxonomy tree nodes is very costly and cannot scale up to the web size Low quality web pages are not interesting at all to most users No standard taxonomies the taxonomies in users mind and those in editors mind may be different Almost for sure when the taxonomy trees are big 1,000+ distinct nodes The popularity of taxonomies declined over time Taxonomies are good for building a knowledge base, though! J. Pei: Information Retrieval and Web Search -- Web Search Basics 4

Problems in Purely Full-text Indexes Differences in books and web pages Number: a relatively small number of books on a specific topic versus a huge number of web pages on a query average quality of books is much higher Lengths of books are considerably longer than lengths of web pages full-text based relevance is more reliable on books Most web pages are of low quality and uninteresting Finding all books relevant to a topic and let the user to select feasible since the number of highly related books is not big Finding all web pages relevant to a query and let the user to select infeasible since too many pages are related to a query Find the high quality web pages Ideas: asking an expert finding authoritative web pages, which needs information more than just full-text but also links J. Pei: Information Retrieval and Web Search -- Web Search Basics 5

Static versus Dynamic Web Pages Static web pages: the content does not vary from one request to the next The content still can be updated from time to time, possibly frequently There are a finite number of static web pages Dynamic web pages: pages mechanically generated by an application server in response to query to a database There are an infinite number of dynamic web pages J. Pei: Information Retrieval and Web Search -- Web Search Basics 6

The Web Graph The static web: a directed graph consisting of static HTML pages together with the hyperlinks between them Each web page is a node, each hyperlink is a directed edge Anchor text: the text surrounding the origin of a hyperlink The web graph is not strongly connected The in-degrees of web pages follow power law distribution Freq(i) = 1 / i α, where α 2.1 J. Pei: Information Retrieval and Web Search -- Web Search Basics 7

The Bowtie Shape of the Web Graph SCC: strongly connected component J. Pei: Information Retrieval and Web Search -- Web Search Basics 8

Advertising Branding A company uses graphical banner advertisements on popular websites to convey viewers a positive feeling about the brand of the company Advertisements are shown on algorithmic search results Cost per mil (CPM): the cost to the company of having its banner advertisement displayed 1,000 times Cost per click (CPC): priced by the number of times an advertisement is clicked J. Pei: Information Retrieval and Web Search -- Web Search Basics 9

Advertising Sponsored Search Advertisers pay for users clicks Goto: for each query term q, it accepts bids from companies who wanted their web page shown on the query q, and returns the pages of all advertisers who bid for q, ordered by their bids If the user clicks a result, the corresponding advertiser pays Goto A popular advertising approach in search engines J. Pei: Information Retrieval and Web Search -- Web Search Basics 10

Spamming, SEO and SEM If a page is ranked high by search engines, the page may have a good opportunity to get branding advertising payment Paid inclusion: an owner pays to have her/his web page included in the search engine s index Spamming: deliberating content and link to make the page ranked high by search engines Search engine optimization (SEO) and search engine marketing (SEM): understanding how search engines rank and how to allocate marketing campaign budgets to different keywords and to different sponsored search engines Click spam: clicking on sponsored search results that are not from bona fide search users Exhausting the advertising budget of a competitor J. Pei: Information Retrieval and Web Search -- Web Search Basics 11

Spamming Tricks Cloaking: returns different pages depending on whether the http request comes from a search engine or a human user s browser Doorway pages contain text and metadata carefully chosen to rank highly on selected search keywords Doorway pages J. Pei: Information Retrieval and Web Search -- Web Search Basics 12

Categories of Search Queries Informational queries: seeking general information on a broad topic What is panda? Need multiple web pages to answer Navigational queries: seeking the website or home page of a single entity that the user has in mind Air Canada seeking homepage of Air Canada instead of any agents selling Air Canada airfare Precision 1 is wanted Transactional queries: leading to transactions on the web, e.g., purchasing a product, downloading a file, joining a social website, J. Pei: Information Retrieval and Web Search -- Web Search Basics 13

Index Size Estimation What percentage of the web is indexed by a search engine? An infinite number of dynamic web pages Given two search engines, what are the relative sizes of their indexes? A search engine can return a page that has not been fully or even partially indexed Search engines organize indexes in various tiers and partitions, not all pages indexed are examined on every search Rude estimation under an (unrealistic) assumption: there is a finite size for the web from which each search engine chooses an independent, uniform subset to index J. Pei: Information Retrieval and Web Search -- Web Search Basics 14

Capture-Recapture Method Let x be the probability that a random page in E 1 is indexed by E 2 Symmetrically, let y be the probability that a random page in E 2 is indexed by E 1 Since x E 1 y E 2, E 1 / E 2 y/x E 1 E 2 x E 1 y E 2 J. Pei: Information Retrieval and Web Search -- Web Search Basics 15

Sampling Techniques (1) How to conduct unbiased sampling from outside the search engine? Conceptually, we need to generate a random page from the entire web and test it for presence in each search engine Picking a web page uniformly at random is difficult Random searches: begin with a search log of web searches, send a random search from the log and pick a random page from the results The log may be biased, a random result from a search may not be a uniformly random page indexed by the search engine J. Pei: Information Retrieval and Web Search -- Web Search Basics 16

Sampling Techniques (2) Random IP addresses: generate random IP addresses and send a request to the corresponding server, collecting all pages at that server Many hosts may share one IP, an IP may not accept http requests from the host of the sampling program, biased on many sites of few web pages Random walks: run a random walk starting at an arbitrary page and converge to a steady state distribution, from which we can pick a web page with a fixed probability The web is not strongly connected some pages are not in the sampling space, a random walk may take a long time to converge J. Pei: Information Retrieval and Web Search -- Web Search Basics 17

Random Queries Idea: pick a page (almost) uniformly at random from a search engine s index by posing a random query to it Picking a random word in a dictionary? Not good frequencies of words vary a lot Implementation Crawling a limited portion of the web or a representative subset of the web (e.g., Yahoo!) Use a random conjunctive query on E 1 and pick from the top 100 returned results a page p at random Test p for presence in E 2 by choosing 6-8 low-frequency terms in p and using them in a conjunctive query for E 2 Iterate a large number of times Classroom discussion: why do we use conjunctive queries of many words? J. Pei: Information Retrieval and Web Search -- Web Search Basics 18

Problems in Random Queries The sample is biased toward longer documents Picking from the top 100 results of E 1 induces a bias from the ranking algorithm of E 1 Either E 1 or E 2 may not respond to the queries E 2 may not handle conjunctive queries of many words E 1 or E 2 may reject robotic spam queries Improvements Use phrases Estimate bias and remove it using statistical methods J. Pei: Information Retrieval and Web Search -- Web Search Basics 19

Random Walk Sampling A random walk on a virtual graph derived from documents Two documents (nodes) are linked by a edge if they share two or more words in common Never instantiate the graph Move from a document d to another by picking a pair of keywords in d and run a query on a search engine and pick a random document from the results J. Pei: Information Retrieval and Web Search -- Web Search Basics 20

Summary The client-server usage of the web Two types of search engines full-text versus taxonomies The web graph Advertising and spamming Categories of web search queries Estimation of index sizes of search engines J. Pei: Information Retrieval and Web Search -- Web Search Basics 21

To-Do List According to the latest research results, which search engine may have the largest coverage/index? Search the web for the answer! J. Pei: Information Retrieval and Web Search -- Web Search Basics 22