Web Search. 2 o Semestre 2012/2013
|
|
|
- Melinda Henderson
- 10 years ago
- Views:
Transcription
1 Dados na Dados na Departamento de Engenharia Informática Instituto Superior Técnico 2 o Semestre 2012/2013
2 Bibliography Dados na Bing Liu, Data Mining: Exploring Hyperlinks, Contents, and Usage Data, 2nd edition. Chapter 6, Sections 6.8,6.9,6.10
3 Outline Dados na 1 2 3
4 Dados na 1 Crawling (later) 2 Index construction 3 Searching and Ranking
5 Index construction Dados na we just saw it: parallel sort-merge ot hit-lists trick: document order follows document importance, independently of query
6 ing and Ranking Dados na preprocessing query terms (stemming, stop-word removal, etc) finding pages containing the query terms ranking pages and returning them to the user as there are many pages matching each possible query and quality is very heterogeneous, authoritative information must be used to assess the quality of web pages. Ranking functions evaluate both the reputation and the content of each page. PageRank will be discussed later.
7 Content-based Evaluation Dados na Considers, for each term, its Occurence type Count Position Occurence types may be: Title Anchor text URL Body Similarity: Compute the dot product of the type weight vector and the count weight vector of each page, yelding the IR score of the page.
8 Combining Ranks Dados na many features are included in a web similarity formula: IR similarity (query-independent) reputation score many other, driven by business-oriented criteria
9 Outline Dados na 1 2 3
10 Principle Dados na
11 Combining Similarity Scores Dados na 1 eliminate duplicates 2 apply a fusion algorithm using similarity scores provided by underlying SE not. these techniques can be used also to combine ranking functions within a search engine
12 Combination Using Similarities Dados na CombMIN(d j ) = min(s 1j, s 2j,..., s kj ) (use the minimum ranking) CombMAX (d j ) = max(s 1j, s 2j,..., s kj ) CombSUM(d j ) = s ij (add the similarity scores) CombMNZ(d j ) = CombSUM(d j ) r j, where r j is the number of systems that retrieved d j CombSUM and CombMNZ perform better. CombMNZ slighlty outperforms CombSUM in most cases.
13 Combination using ranking positions Dados na Borda(1770) Ranking: each voter assigns a linear preference order of candidates, n to the first, n 1 to the second, etc. Unranked candidates divide the votes. Winner gets the most points. Condorcet (1787) Ranking: do pairwise comparisons to count how many times a doc wins, loses or ties against other documents (as in a soccer tournament). Doc with most wins gets highest score. Ties broken on number of losses. Reciprocal ranking: assign a score 1/pos to each doc. Rank based on sum of scores.
14 : Borda rankings example Dados na 5 underlying search engines, which have ranked four candidate pages a, b, c, d. System 1: System 2: System 3: System 4: System 5: a,b,c,d b,a,d,c c,b,a,d c,b,d c,b Scores: Score(a) = = 11.5 Score(b) = = 16 Score(c) = = 15 Score(d) = = 7.5 The final ranking is: b, c, a, d
15 : Condorcet rankings example Dados na System 1: System 2: System 3: System 4: System 5: a,b,c,d b,a,d,c c,b,a,d c,b,d c,b comparisons (win:lose:tie): pair a b c d a - 1:4:0 2:3:0 3:1:1 b 4:1:0-2:3:0 5:0:0 c 3:2:0 3:2:0-4:1:0 d 1:3:1 0:5:0 1:4:0 - win lose tie a b c d The final ranking is: c, b, a, d
16 : MRR example Dados na 5 underlying search engines, which have ranked four candidate pages a, b, c, d. System 1: System 2: System 3: System 4: System 5: a,b,c,d b,a,d,c c,b,a,d c,b,d c,b Scores: Score(a) = 1 + 1/2 + 1/ = 1.83 Score(b) = 1/ /2 + 1/2 + 1/2 = 3 Score(c) = 1/3 + 1/ = 3.55 Score(d) = 1/4 + 1/3 + 1/4 + 1/3 + 0 = 1.17 The final ranking is: c, b, a, d
17 Outline Dados na 1 2 3
18 Dados na Activity of deliberately misleading a search engine by a website owner. Deceivers try to understand how a ranking function computes, by changing the ranking of a page without changing its user-perceived value. SEO - Search Engine Optimization: A business activity that sometimes is legitimate, but often is not perceived as ethical.
19 dumping of many unrelated terms Tom Cruise Content Dados na Attempt to affect the TF.IDF based ranking features Places where to add spam terms: Title meta-tags body anchor text URL Techniques repeat some important terms the picture mining quality of the camera mining is amazing
20 Link Dados na out-link spamming easy: pick popular websites from directories in-link spamming Create a honey pot Add links to web directories Post links to user-generated content sites Participate in a link exchange Create a spam farm
21 Hiding Techniques Dados na Content hiding: pick background white and font color also white Cloaking: serve one page to normal clients and another to search engines Redirection: redirect browser to another page (user sees one, search engine will crawl both)
22 Combating Spam Dados na Give higher weight to anchor text PageRank - assign authority to pages based on number and importance of links TrustRank - the good guys and the bad guys cluster together learn from language features common in spam (longer titles, longer words,...) partition pages in blocks and compute PageRank on a block basis (instead of assigning a single PR value to each page), to defeat honeycombs and link exchanges...an on-going process...
23 Dados na Questions?
Search engine ranking
Proceedings of the 7 th International Conference on Applied Informatics Eger, Hungary, January 28 31, 2007. Vol. 2. pp. 417 422. Search engine ranking Mária Princz Faculty of Technical Engineering, University
1 o Semestre 2007/2008
Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction
Part 1: Link Analysis & Page Rank
Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Exam on the 5th of February, 216, 14. to 16. If you wish to attend, please
Removing Web Spam Links from Search Engine Results
Removing Web Spam Links from Search Engine Results Manuel EGELE [email protected], 1 Overview Search Engine Optimization and definition of web spam Motivation Approach Inferring importance of features
SEO REPORT. Prepared for searchoptions.com.au
REPORT Prepared for searchoptions.com.au March 24, 2016 searchoptions.com.au ISSUES FOUND ON YOUR SITE (MARCH 24, 2016) This report shows the issues that, when solved, will improve your site rankings and
SEO BASICS. March 20, 2015
SEO BASICS March 20, 2015 1. Historical SEO 2. SEO 101 3. Live Site Reviews 4. Current Landscape 5. The Future of SEO 6. Resources 7. Q&A AGENDA 2 SEO Basics Augusta Arts HISTORICAL SEO Search Engine Optimization
Search Engine Optimization Content is Key. Emerald Web Sites-SEO 1
Search Engine Optimization Content is Key Emerald Web Sites-SEO 1 Search Engine Optimization Content is Key 1. Search Engines and SEO 2. Terms & Definitions 3. What SEO does Emerald apply? 4. What SEO
Our SEO services use only ethical search engine optimization techniques. We use only practices that turn out into lasting results in search engines.
Scope of work We will bring the information about your services to the target audience. We provide the fullest possible range of web promotion services like search engine optimization, PPC management,
Web School: Search Engine Optimization
Web School: Search Engine Optimization Today s Goals Provide a foundational introduction to SEO, including on-page tactics and off-page tactics Give tactical guidance for improving your site in CQ Connect
Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. ~ Spring~r
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures ~ Spring~r Table of Contents 1. Introduction.. 1 1.1. What is the World Wide Web? 1 1.2. ABrief History of the Web
Design Chemical 88 point SEO Guideline Checklist
Design Chemical 88 point SEO Guideline The Design Chemical 88 point SEO Guideline has been developed to allow us to systematically evaluate sites and implement changes to make them perform as well as possible.
Worst Practices in. Search Engine Optimization. contributed articles
BY ROSS A. MALAGA DOI: 10.1145/1409360.1409388 Worst Practices in Search Engine Optimization MANY ONLINE COMPANIES HAVE BECOME AWARE of the importance of ranking well in the search engines. A recent article
Successful Search Engine Marketing
Axandra Proven Methods For Successful Search Engine Marketing How to: Invest 1 for your web site success! ü Ü Get and maintain top 10 rankings on Google, Yahoo, MSN and other major search engines. Ü Get
Search engines: ranking algorithms
Search engines: ranking algorithms Gianna M. Del Corso Dipartimento di Informatica, Università di Pisa, Italy ESP, 25 Marzo 2015 1 Statistics 2 Search Engines Ranking Algorithms HITS Web Analytics Estimated
Chapter 6. Attracting Buyers with Search, Semantic, and Recommendation Technology
Attracting Buyers with Search, Semantic, and Recommendation Technology Learning Objectives Using Search Technology for Business Success Organic Search and Search Engine Optimization Recommendation Engines
A COMPREHENSIVE REVIEW ON SEARCH ENGINE OPTIMIZATION
Volume 4, No. 1, January 2013 Journal of Global Research in Computer Science REVIEW ARTICLE Available Online at www.jgrcs.info A COMPREHENSIVE REVIEW ON SEARCH ENGINE OPTIMIZATION 1 Er.Tanveer Singh, 2
WEBSITE AND MARKETING REPORT. Prepared for www.triadinstallations.com
WEBSITE AND MARKETING REPORT Prepared for www.triadinstallations.com December 17, 2015 www.triadinstallations.com SEO ISSUES FOUND ON YOUR SITE (DECEMBER 15, 2015) This report shows the SEO issues that,
Web Spam, Propaganda and Trust
Web Spam, Propaganda and Trust Panagiotis T. Metaxas Wellesley College Wellesley, MA 02481, USA [email protected] Joseph DeStefano College of the Holy Cross Worcester, MA 01610, USA [email protected]
SEO Best Practices Checklist
On-Page SEO SEO Best Practices Checklist These are those factors that we can do ourselves without having to rely on any external factors (e.g. inbound links, link popularity, domain authority, etc.). Content
Corso di Biblioteche Digitali
Corso di Biblioteche Digitali Vittore Casarosa [email protected] tel. 050-315 3115 cell. 348-397 2168 Ricevimento dopo la lezione o per appuntamento Valutazione finale 70-75% esame orale 25-30% progetto
Subordinating to the Majority: Factoid Question Answering over CQA Sites
Journal of Computational Information Systems 9: 16 (2013) 6409 6416 Available at http://www.jofcis.com Subordinating to the Majority: Factoid Question Answering over CQA Sites Xin LIAN, Xiaojie YUAN, Haiwei
Analysis of Web Archives. Vinay Goel Senior Data Engineer
Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner
Successful Search Engine Marketing
Axandra Proven Methods For Successful Search Engine Marketing How to: Invest 1 for your web site success! ü Ü Get and maintain top 10 rankings on Google, Yahoo, MSN and other major search engines. Ü Get
International Journal Of Advance Research In Science And Engineering IJARSE, Vol. No.2, Issue No.7, July 2013
http:// IMPLEMENTATION OF SELECTED SEO TECHNIQUES TO INCREASE VISIBILITY, VISITORS AND ACHIEVE HIGHER RANKINGS IN GOOGLE SEARCH RESULTS FOR NEWLY CREATED WEBSITES K M Patel 1, Prof. Rajesh Pathak 2 1,2
SEO AND CONTENT MANAGEMENT SYSTEM
International Journal of Electronics and Computer Science Engineering 953 Available Online at www.ijecse.org ISSN- 2277-1956 SEO AND CONTENT MANAGEMENT SYSTEM Savan K. Patel 1, Jigna B.Prajapati 2, Ravi.S.Patel
Search Engine Optimization (SEO): Improving Website Ranking
Search Engine Optimization (SEO): Improving Website Ranking Chandrani Nath #1, Dr. Laxmi Ahuja *2 # 1 *2 Amity University, Noida Abstract: - As web popularity increases day by day, millions of people use
Domain Name Abuse Detection. Liming Wang
Domain Name Abuse Detection Liming Wang Outline 1 Domain Name Abuse Work Overview 2 Anti-phishing Research Work 3 Chinese Domain Similarity Detection 4 Other Abuse detection ti 5 System Information 2 Why?
Chapter-1 : Introduction 1 CHAPTER - 1. Introduction
Chapter-1 : Introduction 1 CHAPTER - 1 Introduction This thesis presents design of a new Model of the Meta-Search Engine for getting optimized search results. The focus is on new dimension of internet
A few legal comments on spamdexing
A few legal comments on spamdexing by Gerrit Vandendriessche 1 Altius, Brussels What is spamdexing? The Internet contains a lot of information. In 2002, the total number of web pages was estimated at 2.024
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman 2 Intuition: solve the recursive equation: a page is important if important pages link to it. Technically, importance = the principal
RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS
ISBN: 978-972-8924-93-5 2009 IADIS RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS Ben Choi & Sumit Tyagi Computer Science, Louisiana Tech University, USA ABSTRACT In this paper we propose new methods for
ONLINE ADVERTISING (SEO / SEM & SOCIAL)
ONLINE ADVERTISING (SEO / SEM & SOCIAL) BASIC SEO (SEARCH ENGINE OPTIMIZATION) Search engine optimization (SEO) is the process of affecting the visibility of a website or a web page in a search engine's
Best Practices for WordPress and SEO
Best Practices for WordPress and SEO Original Presentation This presentation was originally given by live, by John Pratt of JTPratt Media to the Wordpress Ann Arbor Meetup Group On January 26th, 2011 Presentation
SEO Guide for Front Page Ranking
SEO Guide for Front Page Ranking Introduction This guide is created based on our own approved strategies that has brought front page ranking for our different websites. We hereby announce that there are
Search Engine Optimisation Managed Service
Search Engine Optimisation Managed Service SEO Managed Service Search Engine Optimisation Managed Service Every day over 350 million searches are performed across the internet so it s imperative that your
Incorporating Participant Reputation in Community-driven Question Answering Systems
Incorporating Participant Reputation in Community-driven Question Answering Systems Liangjie Hong, Zaihan Yang and Brian D. Davison Department of Computer Science and Engineering Lehigh University, Bethlehem,
Practical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles are freely available online:http://www.ijoer.
RESEARCH ARTICLE SURVEY ON PAGERANK ALGORITHMS USING WEB-LINK STRUCTURE SOWMYA.M 1, V.S.SREELAXMI 2, MUNESHWARA M.S 3, ANIL G.N 4 Department of CSE, BMS Institute of Technology, Avalahalli, Yelahanka,
[Ramit Solutions] www.ramitsolutions.com SEO SMO- SEM - PPC. [Internet / Online Marketing Concepts] SEO Training Concepts SEO TEAM Ramit Solutions
[Ramit Solutions] www.ramitsolutions.com SEO SMO- SEM - PPC [Internet / Online Marketing Concepts] SEO Training Concepts SEO TEAM Ramit Solutions [2014-2016] By Lathish Difference between Offline Marketing
SOLUTIONS FOR TOMORROW
SEO, or Search Engine Optimization, is the proactive practice of optimizing a web site based on a targeted keyword strategy by improving internal and external factors in order to increase the traffic a
SEO is one of three types of three main web marketing tools: PPC, SEO and Affiliate/Socail.
SEO Search Engine Optimization ~ Certificate ~ The most advance & independent SEO from the only web design company who has achieved 1st position on google SA. Template version: 2nd of April 2015 For Client
A Study on Various Search Engine Optimization Techniques
A Study on Various Search Engine Optimization Techniques J.Prethi Sagana Poongkode 1, V.Nirosha 2 PG Scholar, Department of Information Technology, SNS College of Technology, Coimbatore, Tamil Nadu, India
26-May-14 http://www.studydoctor.co.za/search-engine-optimization-certificate.pdf http://www.studydoctor.co.za/
SEO Search Engine Optimization ~ Certificate ~ The most advance & independent SEO from the only web design company who has achieved 1st position on google SA. Template version: 2nd of April 2015 For Client
Review of the Four Fairness Criteria
MATH 11008: Fairness Criteria Review of the Four Fairness Criteria Majority Criterion: If candidate X has a majority of the first-place votes, then candidate X should be the winner of the election. The
International Journal of Engineering, Business and Enterprise Applications (IJEBEA) www.iasir.net
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Engineering, Business and Enterprise
70 % MKT 13 % MKT % MKT 16 GUIDE TO SEO
70 % MKT 13 % MKT 16 % MKT GUIDE TO SEO Overview Search engines are essentially directories that contain and organize much of the information available on the internet. The three largest search engines
http://www.khumbulaconsulting.co.za/wp-content/uploads/2016/03/khumbulaconsulting-seo-certificate.pdf Domain
SEO Search Engine Optimization ~ Certificate ~ The most advance & independent SEO from the only web design company who has achieved 1st position on google SA. Template version: 2nd of April 2015 For Client
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
The Need for Training in Big Data: Experiences and Case Studies
The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor
http://www.boundlesssound.co.za/wp-content/uploads/2016/02/boundlesssound-seo-certificate.pdf Domain
SEO Search Engine Optimization ~ Certificate ~ The most advance & independent SEO from the only web design company who has achieved 1st position on google SA. Template version: 2nd of April 2015 For Client
Search Engine Optimisation. Maximise Your Business Potential Online July 2010
Search Engine Optimisation Maximise Your Business Potential Online July 2010 Agenda Indexing Web Page Optimisation Keyword Research Link Building Blogging Video Local Search Search Engine Market Share
SEO: HOW TO DRIVE MORE TRAFFIC TO YOUR WEBSITE
SEO: HOW TO DRIVE MORE TRAFFIC TO YOUR WEBSITE Brock Murray @SEOBrock BEFORE WE START REQUIREMENTS Website (preferably on a CMS ie WordPress) HIGHLY RECOMMENDED! WHAT IS SEO? Search Engine Optimization
Mining Text Data: An Introduction
Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo
Professional Diploma in Digital Marketing Module 2: Search Engine Optimisation Version 4.0 Location: Oslo/Norway Lecturer: Nina Furu
Professional Diploma in Digital Marketing Module 2: Search Engine Optimisation Version 4.0 Location: Oslo/Norway Lecturer: Nina Furu Programme Structure Search Engine Optimisation PROFESSIONAL DIPLOMA
Measure. Analyze. Optimize. Search Engine Optimization. Prepared for: Onstar Pest Control. http://www.onstarpestcontrol.com. Date: March 30th, 2015
YESOnline LLC. 1.844.704.YESO toll free [email protected] Measure. Analyze. Optimize. Search Engine Optimization Prepared for: Onstar Pest Control http://www.onstarpestcontrol.com Date: March 30th, 2015
Combating Web Spam with TrustRank
Combating Web Spam with TrustRank Zoltán Gyöngyi Hector Garcia-Molina Jan Pedersen Stanford University Stanford University Yahoo! Inc. Computer Science Department Computer Science Department 70 First Avenue
Why SEO? What is Search Engine Optimization? Our main areas of expertise are: When a company invests in building a website, their goal is to:
Super Web Solutions - a one-stop-shop for all your web design, development, marketing, hosting and business consulting needs. We offer all the essential tools to guarantee that your business prospers and
SEO Glossary A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-R-S-T-U-V-W-X-Y
SEO Glossary A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-R-S-T-U-V-W-X-Y 3XX Redirections: The HTTP status messages in 300 series are redirection response. These indicate that the resource requested has moved or has
http://www.panstrat.co.za/wp-content/uploads/2015/05/panstrat-seo-certificate.pdf Domain
SEO Search Engine Optimization ~ Certificate ~ The most advance & independent SEO from the only web design company who has achieved 1st position on google SA. Template version: 2nd of April 2015 For Client
http://www.panstrat.co.za/wp-content/uploads/2015/05/panstrat-seo-certificate.pdf Domain
SEO Search Engine Optimization ~ Certificate ~ The most advance & independent SEO from the only web design company who has achieved 1st position on google SA. Template version: 2nd of April 2015 For Client
Pizza SEO: Effective Web. Effective Web Audit. Effective Web Audit. Copyright 2007+ Pizza SEO Ltd. [email protected] http://pizzaseo.
1 Table of Contents 1 (X)HTML Code / CSS Code 1.1 Valid code 1.2 Layout 1.3 CSS & JavaScript 1.4 TITLE element 1.5 META Description element 1.6 Structure of pages 2 Structure of URL addresses 2.1 Friendly
Search Engine Optimization for Higher Education. An Ingeniux Whitepaper
Search Engine Optimization for Higher Education An Ingeniux Whitepaper This whitepaper provides recommendations on how colleges and universities may improve search engine rankings by focusing on proper
Website Standards Association. Business Website Search Engine Optimization
Website Standards Association Business Website Search Engine Optimization Copyright 2008 Website Standards Association Page 1 1. FOREWORD...3 2. PURPOSE AND SCOPE...4 2.1. PURPOSE...4 2.2. SCOPE...4 2.3.
Search Engine Submission
Search Engine Submission Why is Search Engine Optimisation (SEO) important? With literally billions of searches conducted every month search engines have essentially become our gateway to the internet.
Introduction to Information Retrieval http://informationretrieval.org
Introduction to Information Retrieval http://informationretrieval.org IIR 7: Scores in a Complete Search System Hinrich Schütze Center for Information and Language Processing, University of Munich 2014-05-07
www.superbikemag.co.za/wp-content/uploads/2015/07/superbike-seo-certificate Domain
SEO Search Engine Optimization ~ Certificate ~ The most advance & independent SEO from the only web design company who has achieved 1st position on google SA. Template version: 2nd of April 2015 For Client
Four Keys: Enhancing Search Engine Optimization
Four Keys: Enhancing Search Engine Optimization A Quick Guide for Vacation Rental Property Managers Driving Website Traffic Presented by LiveRez Introduction In the past few years, the utilization of search
Web Spam, Propaganda and Trust
Web Spam, Propaganda and Trust Panagiotis T. Metaxas Computer Science Department Wellesley College Wellesley, MA 02481, USA [email protected] Joseph DeStefano Math and Computer Science Department
Big Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany MapReduce II MapReduce II 1 / 33 Outline 1. Introduction
