Ph.D. Workshop January 30th, 2012 Improving Search by using Query Logs and a bit of Seman9cs Diego Ceccarelli University of Pisa, Department of Computer Science High Performance Compu9ng Laboratory ISTI- CNR Supervisor: Dr. Raffaele Perego
Summary - Efficiency Supersnippets : mining query logs for improving SE performances D. Ceccarelli, C. Lucchese, S. Orlando, R. Perego, F. Silvestri. Caching query- biased snippets for efficient retrieval. In Proceedings of «EDBT 11: Interna9onal Conference on Extending Database Technology», Uppsala, Sweden, March 2011.
Summary Web Of Data Understanding the Web Of Data First study on a real crawl of the Web Of Data S. Campinas, D. Ceccarelli, T. E. Perry, R. Delbru, K. Balog, and G. Tummarello. The Sindice- 2011 Dataset for EnHty- Oriented Search in the Web of Data. Proceedings of «EOS 2011 : SIGIR 2011 Workshop on En9ty- Oriented Search», July 28, 2011 Beijing, China. Ivan Herman (W3C) Vocabulary Search on the SemanHc Web for RDFa Default Profiles
Summary Effec9veness Mining the Web Of Data for improving SE effec9veness D. Ceccarelli, S. Gordea, C. Lucchese, F. M. Nardini and G. Tolomei. Improving Europeana Search Experience Using Query Logs. In Proceedings of «TPDL 11: Interna9onal Conference on Theory and Prac9ce of Digital Libraries», Berlin, Germany, September 2011. D. Ceccarelli, S. Gordea, C. Lucchese, F.M. Nardini, R. Perego, G. Tolomei. Discovering Europeana users search behavior. In «ERCIM News», No. 86, March 2011.
Query Biased Snippets Top- k results relevant for the query TITLE SNIPPET URL Short piece of text summarizing the content of the page, very important to es9mate relevance for the user s query high- quality, query- biased
Mo9va9ons Different queries... b) same 9tle, same URL, but different snippets a) same 9tle, same URL, same snippet
WSE common prac9ces WSE FRONT END DOCUMENT REPOSITORY 4 1 5 WSE BACK END 6 2 3 Query Processor Query Processor Query Processor
WSE Front End caching WSE FE DOCUMENT REPOSITORY 4 1 5 SERPs caching does not elp in the cases A and B BaACK bove, WSE END FE Ch ache 2 is generated for the when the same o r a s imilar s nippet 6 same doc but different queries 3 Query Processor Query Processor Query Processor
Supersnippets Relevant documents are characterized by a few snippets Snippets are made of a few sentences Different snippets from the same document share common sentences DEFINITION: Given the set Q d, of all the past queries for which document d was returned, we define supersnippet ss n of d the set of the n most frequent sentences occurring in the snippets S d,q generated for answering queries in Q d
Doc Repository Caching strategies WSE FRONT END DOCUMENT REPOSITORY DR Cache 4 1 5 FE Cache 6 WSE BACK END 2 3 Query Processor Query Processor Query Processor
Efficiency SERPs cache filtering out most recently submiied queries Total Hit Ra9o: ~62%
The Web Of Data be acbout any topic of The WThe eb soeman9c f Data miarkup s the can web omposed pages which have seman9c markup RDF, RDFa, Microformats or Microdata
RDF in a nutshell General method for conceptual descrip9on or modeling of informa9on that is implemented in web resources URI Uniform Resource Iden9fier en.wikipedia.org/wiki/love Example: hip://www.di.unipi.it/~ceccarel/foaf.rdf Literal it.wikipedia.org/wiki/alice_in_wonderland Example: Alice loves Bob it.wikipedia.org/wiki/bob_marley Example: hip://www.di.unipi.it/~ceccarel/foaf.rdf foaf:givenname DIEGO Blank Node - anonymous resource Example: hip://www.di.unipi.it/~ceccarel/foaf.rdf foaf:knows :_x :_x foaf:givenname BOB
An analysis of The Web Of Data Paper at SIGIR workshop Map- Reduce Framework for extrac9ng sta9s9cs of the Web of Data Public Dataset Released First analysis of the Web of Data Useful for ranking and hw tuning W3C is interested to know how people use standard namespaces hip://data.sindice.com/trec2011/
DATA SiZE > 1 Tera Results
Results
Search Shortcuts Da Vinci Broccolo, Marcon, Nardini, Perego, Silvestri. Genera9ng sugges9ons for queries in the long tail with an inverted index, IP&M, 2011
From Queries to Virtual Documents Da Vinci Da Vinci Mona Lisa Michelangelo Leonardo Leonardo Da Vinci Da Vinci Da TITOLO Vinci Da Code Vinci Code NOME COGNOME Da Vinci Da Drawings Vinci Drawings Da Vinci Leonardo Da Vinci Da Vinci Code Da Vinci Drawings Michelangelo Michelangelo Paints Michelangelo's Life Da Vinci Michelangelo Paints Michelangelo's Life Da Vinci Mona Lisa Mona Lisa Query: Da Vinci Paints
Sugges9on Ranking Tested on the Europeana Query Logs, applicahon is going to be deployed in the main portal D. Ceccarelli, S. Gordea, C. Lucchese, F. M. Nardini and G. Tolomei. Improving Europeana Search Experience Using Query Logs. In Proceedings of «TPDL 11: Interna9onal Conference on Theory and Prac9ce of Digital Libraries», Berlin, Germany, September 2011.
Thanks for Your Aien9on! Any QuesHons?