Improving Search by using Query Logs and a bit of Seman9cs



Similar documents
Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2

Search in BigData2 - When Big Text meets Big Graph 1. Introduction State of the Art on Big Data

NS DISCOVER 4.0 ADMINISTRATOR S GUIDE. July, Version 4.0

DISCOVERING RESUME INFORMATION USING LINKED DATA

Ligero Content Delivery Server. Documentum Content Integration with

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Personal Browsing History and B-hist Data Storage

Towards the Integration of a Research Group Website into the Web of Data

Deep Freeze and Microsoft System Center Configuration Manager 2012 Integration

MSc Data Science at the University of Sheffield. Started in September 2014

Search and Information Retrieval

LDIF - Linked Data Integration Framework

EUR-Lex 2012 Data Extraction using Web Services

1. Base Programming. GIORGIO RUSSOLILLO - Cours de prépara+on à la cer+fica+on SAS «Base Programming»

Advanced Project Management Training Course

Big Data in Web Search. Claudio Lucchese hpc.isti.cnr.it

RESTful or RESTless Current State of Today's Top Web APIs

Performance Management. Ch. 9 The Performance Measurement. Mechanism. Chiara Demar8ni UNIVERSITY OF PAVIA. mariachiara.demar8ni@unipv.

Varnish the Drupal way

Information and Communications Technology Supply Chain Risk Management (ICT SCRM) AND NIST Cybersecurity Framework

Ontology based Recruitment Process

Internet Marke/ng Evalua/on. Personalized For. XYZ Law, PA

Towards a Visually Enhanced Medical Search Engine

How To Write A Drupal Rdf Plugin For A Site Administrator To Write An Html Oracle Website In A Blog Post In A Flashdrupal.Org Blog Post

dati.culturaitalia.it a Pilot Project of CulturaItalia dedicated to Linked Open Data

Deep Freeze and Microsoft System Center Configuration Manager 2012 Integration

! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)

How To Evaluate Web Applications

Monitoring Pramati Web Server

WHAT'S NEW IN SHAREPOINT 2013 WEB CONTENT MANAGEMENT

ARC: appmosphere RDF Classes for PHP Developers

A Semantic web approach for e-learning platforms

How To Use Query Console

Identifying the Number of Visitors to improve Website Usability from Educational Institution Web Log Data

Semantic Search in Portals using Ontologies

ANALYSING SERVER LOG FILE USING WEB LOG EXPERT IN WEB DATA MINING

Analyzing the Different Attributes of Web Log Files To Have An Effective Web Mining

Simple Tips to Improve Drupal Performance: No Coding Required. By Erik Webb, Senior Technical Consultant, Acquia

Coveo Platform 7.0. Microsoft Active Directory Connector Guide

2. Metadata Modeling Best Practices with Cognos Framework Manager

Configuring Apache HTTP Server as a Reverse Proxy Server for SAS 9.3 Web Applications Deployed on Oracle WebLogic Server

Corporate Account Takeover & Information Security Awareness

Scaling out a SharePoint Farm and Configuring Network Load Balancing on the Web Servers. Steve Smith Combined Knowledge MVP SharePoint Server

Linked Medieval Data: Semantic Enrichment and Contextualisation to Enhance Understanding and Collaboration

Tuning Tableau Server for High Performance

SmartLink: a Web-based editor and search environment for Linked Services

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

Extending the Linked Data API with RDFa

IIR Proceedings of the Fourth Italian Information Retrieval Workshop. Roberto Basili, Fabrizio Sebastiani, Giovanni Semeraro (Eds.

Products that are referred to in this document may be trademarks and/or registered trademarks of the respective owners.

Linked Open Data Infrastructure for Public Sector Information: Example from Serbia

Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

A Comparative Approach to Search Engine Ranking Strategies

Lift your data hands on session

Evaluation experiment of ontology tools interoperability with the WebODE ontology engineering workbench

DEPLOYMENT GUIDE Version 1.2. Deploying F5 with Oracle E-Business Suite 12

The Ontology and Architecture for an Academic Social Network

StarWind iscsi SAN: Global Deduplication with Veeam Backup&Replication

Scalable and Reactive Programming for Semantic Web Developers

BusinessObjects Enterprise XI Release 2

Run$me Query Op$miza$on

Walk Before You Run. Prerequisites to Linked Data. Kenning Arlitsch Dean of the

EC SUITE TRAFFIC MANAGER

Meaningful Use - HL7 Version 2

HybIdx: Indexes for Processing Hybrid Graph Patterns Over Text-Rich Data Graphs Technical Report

Manage Workflows. Workflows and Workflow Actions

DATA MINING - 1DL105, 1DL025

IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Internet Information Services Agent Version Fix Pack 2.

Transcription:

Ph.D. Workshop January 30th, 2012 Improving Search by using Query Logs and a bit of Seman9cs Diego Ceccarelli University of Pisa, Department of Computer Science High Performance Compu9ng Laboratory ISTI- CNR Supervisor: Dr. Raffaele Perego

Summary - Efficiency Supersnippets : mining query logs for improving SE performances D. Ceccarelli, C. Lucchese, S. Orlando, R. Perego, F. Silvestri. Caching query- biased snippets for efficient retrieval. In Proceedings of «EDBT 11: Interna9onal Conference on Extending Database Technology», Uppsala, Sweden, March 2011.

Summary Web Of Data Understanding the Web Of Data First study on a real crawl of the Web Of Data S. Campinas, D. Ceccarelli, T. E. Perry, R. Delbru, K. Balog, and G. Tummarello. The Sindice- 2011 Dataset for EnHty- Oriented Search in the Web of Data. Proceedings of «EOS 2011 : SIGIR 2011 Workshop on En9ty- Oriented Search», July 28, 2011 Beijing, China. Ivan Herman (W3C) Vocabulary Search on the SemanHc Web for RDFa Default Profiles

Summary Effec9veness Mining the Web Of Data for improving SE effec9veness D. Ceccarelli, S. Gordea, C. Lucchese, F. M. Nardini and G. Tolomei. Improving Europeana Search Experience Using Query Logs. In Proceedings of «TPDL 11: Interna9onal Conference on Theory and Prac9ce of Digital Libraries», Berlin, Germany, September 2011. D. Ceccarelli, S. Gordea, C. Lucchese, F.M. Nardini, R. Perego, G. Tolomei. Discovering Europeana users search behavior. In «ERCIM News», No. 86, March 2011.

Query Biased Snippets Top- k results relevant for the query TITLE SNIPPET URL Short piece of text summarizing the content of the page, very important to es9mate relevance for the user s query high- quality, query- biased

Mo9va9ons Different queries... b) same 9tle, same URL, but different snippets a) same 9tle, same URL, same snippet

WSE common prac9ces WSE FRONT END DOCUMENT REPOSITORY 4 1 5 WSE BACK END 6 2 3 Query Processor Query Processor Query Processor

WSE Front End caching WSE FE DOCUMENT REPOSITORY 4 1 5 SERPs caching does not elp in the cases A and B BaACK bove, WSE END FE Ch ache 2 is generated for the when the same o r a s imilar s nippet 6 same doc but different queries 3 Query Processor Query Processor Query Processor

Supersnippets Relevant documents are characterized by a few snippets Snippets are made of a few sentences Different snippets from the same document share common sentences DEFINITION: Given the set Q d, of all the past queries for which document d was returned, we define supersnippet ss n of d the set of the n most frequent sentences occurring in the snippets S d,q generated for answering queries in Q d

Doc Repository Caching strategies WSE FRONT END DOCUMENT REPOSITORY DR Cache 4 1 5 FE Cache 6 WSE BACK END 2 3 Query Processor Query Processor Query Processor

Efficiency SERPs cache filtering out most recently submiied queries Total Hit Ra9o: ~62%

The Web Of Data be acbout any topic of The WThe eb soeman9c f Data miarkup s the can web omposed pages which have seman9c markup RDF, RDFa, Microformats or Microdata

RDF in a nutshell General method for conceptual descrip9on or modeling of informa9on that is implemented in web resources URI Uniform Resource Iden9fier en.wikipedia.org/wiki/love Example: hip://www.di.unipi.it/~ceccarel/foaf.rdf Literal it.wikipedia.org/wiki/alice_in_wonderland Example: Alice loves Bob it.wikipedia.org/wiki/bob_marley Example: hip://www.di.unipi.it/~ceccarel/foaf.rdf foaf:givenname DIEGO Blank Node - anonymous resource Example: hip://www.di.unipi.it/~ceccarel/foaf.rdf foaf:knows :_x :_x foaf:givenname BOB

An analysis of The Web Of Data Paper at SIGIR workshop Map- Reduce Framework for extrac9ng sta9s9cs of the Web of Data Public Dataset Released First analysis of the Web of Data Useful for ranking and hw tuning W3C is interested to know how people use standard namespaces hip://data.sindice.com/trec2011/

DATA SiZE > 1 Tera Results

Results

Search Shortcuts Da Vinci Broccolo, Marcon, Nardini, Perego, Silvestri. Genera9ng sugges9ons for queries in the long tail with an inverted index, IP&M, 2011

From Queries to Virtual Documents Da Vinci Da Vinci Mona Lisa Michelangelo Leonardo Leonardo Da Vinci Da Vinci Da TITOLO Vinci Da Code Vinci Code NOME COGNOME Da Vinci Da Drawings Vinci Drawings Da Vinci Leonardo Da Vinci Da Vinci Code Da Vinci Drawings Michelangelo Michelangelo Paints Michelangelo's Life Da Vinci Michelangelo Paints Michelangelo's Life Da Vinci Mona Lisa Mona Lisa Query: Da Vinci Paints

Sugges9on Ranking Tested on the Europeana Query Logs, applicahon is going to be deployed in the main portal D. Ceccarelli, S. Gordea, C. Lucchese, F. M. Nardini and G. Tolomei. Improving Europeana Search Experience Using Query Logs. In Proceedings of «TPDL 11: Interna9onal Conference on Theory and Prac9ce of Digital Libraries», Berlin, Germany, September 2011.

Thanks for Your Aien9on! Any QuesHons?