WHAT IS THE TEMPORAL VALUE OF WEB SNIPPETS?
|
|
- Monica Dennis
- 8 years ago
- Views:
Transcription
1 WHAT IS THE TEMPORAL VALUE OF WEB SNIPPETS? Ricardo Campos 1, 2, 4 Gaël Dias 2, Alípio Jorge 3, 4 1 Tomar Polytechnic Institute, Tomar, Portugal 2 Centre of Human Language Tecnnology and Bioinformatics, University of Beira Interior, Covilhã, Portugal 3 Faculty of Sciences, University of Oporto, OPorto, Portugal 4 LIAAD-INESC Porto L.A, OPorto, Portugal TWAW st International Temporal Web Analytics Workshop in association with WWW 2011, Hyderabad - India, March 28, 2011 [ w w w. i p t. p t ] [ w w w. l i a a d. u p. p t ] h u l t i g. d i. u b i. p t ]
2 INTRODUCTION Web Logs Temporal Expressions Time in the WWW Queries Research Causes and Consequences Our Goal 2-39
3 WWW 2011 SIGIR Last Month INTRODUCTION What is Time? Web Logs TEMPORAL EXPRESSIONS Time in the WWW Queries How is Time Expressed? Research Causes and Consequences Our Goal Time is expressed in a number of different forms, depending on how the temporal intent is defined: Explicit Expressions Implicit Expressions Relative Expressions 3-39
4 INTRODUCTION Web Logs Time is everywhere in the WWW Temporal Expressions TIME IN THE WWW Queries Research Causes and Consequences Our Goal s Webpage s, Blogs Newswire Articles Web Snippets Web Logs Digital Libraries TIME 4-39
5 INTRODUCTION Web Logs And What About Queries? Temporal Expressions Time in the WWW QUERIES Temporal Information Research Our Goal Football World Cup Miss Universe
6 INTRODUCTION Web Logs Temporal Expressions TEMPORAL INFORMATION Time in the WWW Research Queries Our Goal Consequences of not using Temporal Information Such a lack has several consequences: prevents the distribution of relevant documents over time and users to be aware of possible historical perspectives of given subjects; prevents the modeling of user queries according to a specific period; This is mainly due to the fact that the retrieval models used, keep representing documents and queries using a simplistic representation where the temporal semantics of the documents are ignored. 6-39
7 INTRODUCTION Reasons for that Web Logs Temporal Expressions Time in the WWW Queries TEMPORAL INFORMATION Research Our Goal From a query point of view, this is because systems do not infer temporal intents expressed by users in a query (e.g., Miss Universe); From a document point of view, although times clues may be found in the texts, they are usually not taken into account, certainly due to the difficulties that exist to correlate temporal information to the corresponding topics. 7-39
8 INTRODUCTION Web Logs Temporal Expressions Time in the WWW Queries Research in Temporal Information Temporal Information RESEARCH Our Goal Temporal and Content Dynamics Crawling, Indexing and Ranking Detection of Topics and their Track of Changes over Time Web Archives Developing Web Search Improvement 8-39
9 INTRODUCTION Web Logs Why Web Search Improvement? Temporal Expressions Time in the WWW Queries Temporal Information RESEARCH Our Goal Temporal Representation of the Documents As pointed out by (Dakka, Gravano and Ipeirotis 2008) systems mainly use topic similarity ranking of the query results to return documents, not modeling time explicitly. Timelines Web Search Improvement Temporal Snippets Notwithstanding as mentioned by (Alonso, Gertz and Baeza- Yates, 2009) few works have fully used temporal information for exploration and search purposes. Temporal Web Search Engines + Temporal Clustering 9-39
10 INTRODUCTION Web Logs Temporal Expressions Time in the WWW Queries Temporal Information Research OUR GOAL Understanding the Temporal Nature of Implicit User Queries Return of relevant results from several specific periods Understanding the temporal intent of documents and queries is therefore of the utmost importance to produce high quality information retrieval systems Position it on time Define appropriate interfaces to explore the query 10-39
11 INTRODUCTION Web Logs Temporal Expressions Time in the WWW Queries In this paper we have two objectives Temporal Information Research OUR GOAL This paper is the result of part of this research: Study whether web snippets are a valuable source of data to help inferring the temporal intents of queries, either implicitly or explicitly formulated; Study, in parallel with what has been done by (Nunes, Ribeiro and David, 2008), the temporal value of web query logs
12 Summary Web Logs Web Logs 12-39
13 WEB SNIPPETS Twofold Approach Web Logs FRAMEWORK Temporal Value Temporal Value Discussion Implicit Temporal Queries Implicit Temporal Queries Discussion Study the existence of temporal information within web snippets. We are particularly focused on extracting year dates, which are a kind of temporal information that often appears in this type of collection. Check if such information can be used to temporally classify implicit queries; Collection Execution Automatic Date Identification Metrics Classification of Implicit Temporal Queries Q465 Q465R20 Q450R20 Rule based model TSnippets TTitle Concept Classification Ambiguous Broad Clear Q450 Q450R100 Experiment One: Temporal Data in TURL Future Dates Temporal Classification ATemporal Ambiguous Unambiguous Experiment Two: Use of to date Temporal Implicit Queries 13-39
14 Web Logs WEB SNIPPETS Construction of the DataSet FRAMEWORK Temporal Value Temporal Value Discussion Implicit Temporal Queries Implicit Temporal Queries Discussion Collection dataset comprises a series of snippets, titles and associated URLs. 20 queries * 27 categories 540 queries January 2010 October World Cup 2010; Calendar 2011; Hair Styles 2010; Oil Spill; Oil Spill; BP Oil Spill; Waka Waka; 12.69% Internet 9.89% Computer & Electronics 7.96% Entertainment 14-39
15 Web Logs WEB SNIPPETS Framework TEMPORAL VALUE Temporal Value Discussion Execution December 2010 Implicit Temporal Queries Implicit Temporal Queries Discussion Collection Execution Q465R Q450R Q450R100 16,648 62,842 16,
16 Web Logs WEB SNIPPETS Automatic Date Identification Automatic Date Collection Execution Identification Framework TEMPORAL VALUE Temporal Value Discussion Implicit Temporal Queries Implicit Temporal Queries Discussion Upon the retrieved results, particularly over each triple item <snippet, title, url>, we ran our self-defined rule-based model in order to mark dates expressed by means of numerical patterns, particularly year dates 62,842 16,129 16,648 yyyy, yyyy-yyyy, yyyy/yyyy, mm/dd/yyyy, mm.dd.yyyy, dd/mm/yyyy and dd.mm/yyyy. Q465R20 Q450R20 Q450R100 Snippets 95.8% 94.3% 93.1% Titles 97.9% 95.8% 95.3% URLs 85.1% 75.0% 87.4% 16-39
17 WEB SNIPPETS Web Logs Framework TEMPORAL VALUE Temporal Value Discussion Implicit Temporal Queries Implicit Temporal Queries Discussion Evaluation Metrics Collection Execution Automatic Date Identification Metrics In order to better understand and determine the temporal value of each item, we defined three basic measures taking into account the query q. TSnippets = TTitles = TURLs = # Snippets Retrieved with Dates # Snippets Retrieved # Titles Retrieved with Dates # Titles Retrieved # URLs Retrieved with Dates # URLs Retrieved 17-39
18 Web Logs WEB SNIPPETS Comparison of the Different Values Framework Temporal Value TEMPORAL VALUE DISCUSSION Implicit Temporal Queries Implicit Temporal Queries Discussion Collection Execution Automatic Date Identification Metrics Q465R20 Q450R20 Q450R100 Snippets 12.40% 9.50% 9.19% Titles 5.69% 2.98% 3.27% URLs 4.26% 1.89% 5.59% TSnippets, TTitle and TURL 5,59% 100 Results per 3,27% 9,19% TURL(.) 20 Results per 1,89% 2,98% Ttitle(.) Tsnippets(.) 9,50% 0,00% 1,00% 2,00% 3,00% 4,00% 5,00% 6,00% 7,00% 8,00% 9,00% 10,00% 18-39
19 Web Logs WEB SNIPPETS Items with More than One Date Framework Temporal Value TEMPORAL VALUE DISCUSSION Implicit Temporal Queries Implicit Temporal Queries Discussion Collection Execution Automatic Date Identification Metrics Relation between items with dates and with more than one date (Q450R20) Items With Dates Items with more than one date Snippets Title Url 19-39
20 Web Logs WEB SNIPPETS Framework Temporal Value Implicit Temporal Queries TEMPORAL VALUE DISCUSSION Implicit Temporal Queries Discussion Distribution of Dates, Classification, Future Dates Collection Execution Automatic Date Identification Metrics Distribution of dates in the Timeline (Q450R100) Snippets Title URL Dates Sports Automotive Society Politics e.g., Calendar e.g., Football e.g., Dacia Duster e.g., Baby e.g., Barack Obama Q465R20 Q450R20 Q450R100 Snippets 18.6% 6.5% 7.9% Titles 13.8% 18.9% 19.7% URLs 9.64% 8.8% 5.7% 20-39
21 Introduction Web Logs WEB SNIPPETS Framework Temporal Value IMPLICIT TEMPORAL QUERIES Temporal Value Discussion Implicit Temporal Queries Discussion Understand the Temporal Nature of Implicit Queries Collection Execution Automatic Date Identification Metrics Classification of Implicit Temporal Queries Given the temporal value of web snippets, we aim at understanding if this temporal information can be used to automatically disambiguate query terms, namely implicit temporal queries. Football World Cup 21-39
22 Web Logs WEB SNIPPETS Framework Temporal Value IMPLICIT TEMPORAL QUERIES Temporal Value Discussion Implicit Temporal Queries Discussion Understand the Temporal Nature of Implicit Queries Collection Execution Automatic Date Identification Metrics Classification of Implicit Temporal Queries This is a particular hard task in the extent that temporal information is not available at least in a direct way (e.g., WWW)
23 Web Logs WEB SNIPPETS is Ambiguous in Concept? Framework Temporal Value Temporal Value Discussion IMPLICIT TEMPORAL QUERIES Implicit Temporal Queries Discussion Collection Execution Automatic Date Identification Metrics Classification of Implicit Temporal Queries We adopt the methodology proposed by (Song, Luo, Nie, Yu and Hon, 2009): Ambiguous : a query that has more than one meaning, e.g., Scorpions, which may refer to a rock band, the arachnid and the zodiac sign Broad : a query that covers a variety of subtopics, e.g., quotes, which covers some subtopics such as love quotes, historical quotes, etc. 54 Clear : a query that has a specific meaning and covers a narrow topic. Usually is a successful search in which the user can find what he is looking for in the first page of results, e.g., Bank of America
24 Temporal Classification Frequency Introduction Web Logs WEB SNIPPETS Framework Temporal Value IMPLICIT TEMPORAL QUERIES Temporal Value Discussion Implicit Temporal Queries Discussion Framework to Temporally Classify Implicit Queries Automatic Date Classification of Implicit Metrics Collection Execution Identification Temporal Queries Implicit Ambiguous in Concept? Non-Ambiguous Ambiguous Temporally Ambiguous Time 24-39
25 Temporal Classification Frequency Introduction Web Logs WEB SNIPPETS Framework Temporal Value IMPLICIT TEMPORAL QUERIES Temporal Value Discussion Implicit Temporal Queries Discussion Framework to Temporally Classify Implicit Queries Automatic Date Classification of Implicit Metrics Collection Execution Identification Temporal Queries Implicit Ambiguous in Concept? Non-Ambiguous Ambiguous Temporally Ambiguous BP Oil Spill Temporally Unambiguous 2010 Time 25-39
26 Temporal Classification Frequency Introduction Web Logs WEB SNIPPETS Framework Temporal Value IMPLICIT TEMPORAL QUERIES Temporal Value Discussion Implicit Temporal Queries Discussion Framework to Temporally Classify Implicit Queries Automatic Date Classification of Implicit Metrics Collection Execution Identification Temporal Queries Implicit Ambiguous in Concept? Non-Ambiguous Ambiguous Temporally Ambiguous Temporally Unambiguous Make my Trip ATemporal Time 26-39
27 WEB SNIPPETS Web Logs Framework Temporal Value Temporal Value Discussion IMPLICIT TEMPORAL QUERIES Implicit Temporal Queries Discussion Temporal Ambiguity Collection Execution Automatic Date Identification Metrics Classification of Implicit Temporal Queries Each of the 176 queries is classified into one of these three categories based on the temporal value of the triple items <snippet, title, url> retrieved. Given the fact that dates occur in a different proportion in any of the items <snippet, title, url>, we value each differently through Q450R20 Q450R100 TSnippets 66.10% 50.91% TTitles 20.75% 18.14% TURLs 13.75% 30.95% 100% 100% We call this temporal value: temporal ambiguity: 27-39
28 Web Logs WEB SNIPPETS Framework Temporal Value Temporal Value Discussion Implicit Temporal Queries IMPLICIT TEMPORAL QUERIES DISCUSSION Temporal Classification of the 176 Clear Concept Queries Collection Execution Automatic Date Identification Metrics Classification of Implicit Temporal Queries If (TA(q) < 10%) then is ATemporal Else (If TA(q) >= 10%) { If (query results are associated with > 1 year) then is Temporal Ambiguous else is Temporal Unambiguous } 28-39
29 Web Logs WEB SNIPPETS Framework Temporal Value Temporal Value Discussion Implicit Temporal Queries IMPLICIT TEMPORAL QUERIES DISCUSSION Temporal Classification of the 176 Clear Concept Queries Collection Execution Automatic Date Identification Metrics Classification of Implicit Temporal Queries 40 queries 23% 4 queries 2% Conceptual Classification Number Queries 132 queries 75% Ambiguous 220 Temporal Classification Number Queries % Clear 176 ATemporal % Ambiguous 40 23% Unambiguous 4 2% Broad
30 Goal WEB QUERY LOGS FRAMEWORK Temporal Value Temporal Value Discussion We complement our knowledge about temporal information by understanding the explicit relationships existing between queries and dates; We are interested in seeking for explicit temporal queries, e.g., Iraq War 1991 or World Cup Understand two phenomena: (1) are users interested in future dates when looking for a given subject? (2) what is the type of information they are seeking when issuing a query together with a date? 30-39
31 WEB QUERY LOGS Construction of the DataSet FRAMEWORK Temporal Value Temporal Value Discussion Web Logs dataset comprises a series of queries: 21,011,240 queries 10,154,742 queries 143,590 queries 601 queries March 2006 May German Coins; American Flag in 1943; Ford 2009; Epson p2000; 21.96% Automotive 9.48% Entertainment 8.15% Sports 31-39
32 WEB QUERY LOGS Framework TEMPORAL VALUE Temporal Value Discussion 1,21% of Temporal Explicit Queries Temporal Value = 143,590 10,154,742 = 1.41% We may end up with an even lower value. We classify each query according to one of two categories: real date or false date; 601 We conclude that 87 queries (14.14% of the sample) were false positives; 143, ,286 Temporal Explicit Queries, i.e., 1.21%. We can conclude that dates are seldom used by the users to express his intents
33 Introduction WEB QUERY LOGS Framework Temporal Value TEMPORAL VALUE DISCUSSION Distribution of Dates and Future Dates Distribution of Dates in the Timeline Notwithstanding a decrease from 2006 onwards (we recall that this collection is from 2006), future dates still represent 3.49% of the sample collection
34 WEB QUERY LOGS Web Logs Drawbacks Framework Temporal Value TEMPORAL VALUE DISCUSSION Web Logs are extremely hard to access outside the big industrial labs and highly dependent on the user own intents: Queries that have never been typed, thus not existing in the web search log e.g. Blaise Pascal 1623 (his year birth date) Less year qualified queries that may be as relevant as the most frequents ones 34-39
35 Web Logs CONCLUSIONS EXTRACTION OF TEMPORAL INFORMATION and Web Logs Drawbacks Metadata Approach vs Content-based Approach Time has been gaining an increasing importance in IR in a large number of subareas; Documents are full of temporal expressions, however not always exploited; Inferring the user intentions and the period the user has in mind may therefore play an extremely important role; 35-39
36 Web Logs CONCLUSIONS VS Web Logs Extraction of Temporal Information WEB SNIPPETS AND WEB QUERY LOGS Drawbacks In this paper, we showed that query understanding (25% of the queries have an implicit temporal nature) is possible through the use of web snippets; Beyond affording a faster processing, web snippets are a very rich data source, containing a broad range of temporal information (namely years) that can be used to help on inferring the temporal intent of queries; In the opposite direction web query logs have a very small temporal value (at about 1%)
37 Web Logs CONCLUSIONS VS Web Logs Extraction of Temporal Information WEB SNIPPETS AND WEB QUERY LOGS Drawbacks Its also interesting to note that future dates are very common in web snippets, but seldom used in Queries; And that in web snippets, some of the items even have more than one date. Temporal value in web snippets mostly appears together with the categories of automotive, sports, politics, which is precisely what users are looking for when they issue explicit queries 37-39
38 Web Logs CONCLUSIONS Drawbacks and Future Work Extraction of Temporal Information and Web Logs DRAWBACKS Web snippets are computed by search engines, which we do not control basing our system upon results generated by a black box may prevent from obtaining a clear picture of the temporal values of web snippets Drawbacks and Future Work we need to evaluate the feasibility of developing a search engine, albeit of a small scale, which will also enable us to compare a full text analysis approach with a web snippet based one 38-39
39 Web Logs Thanks for your attention! Both experiments are available for download at VipAccess is online at HULTIG is online at LIAAD is online at Polytechnic Institute of Tomar is online at Gaël Dias is online at Alípio Jorge is online at
Disambiguating Implicit Temporal Queries by Clustering Top Relevant Dates in Web Snippets
Disambiguating Implicit Temporal Queries by Clustering Top Ricardo Campos 1, 4, 6, Alípio Jorge 3, 4, Gaël Dias 2, 6, Célia Nunes 5, 6 1 Tomar Polytechnic Institute, Tomar, Portugal 2 HULTEC/GREYC, University
More informationTemporal Web Image Retrieval
Gaël Dias a, José G. Moreno a, Adam Jatowt b, Ricardo Campos c,( Paul Martin a, Frédéric Jurie a, Youssef Chahir a ) (a) HULTECH/IMAGE/GREYC - University of Caen Basse-Normandie, France (b) TANAKA Lab
More informationThe 2006 IEEE / WIC / ACM International Conference on Web Intelligence Hong Kong, China
WISE: Hierarchical Soft Clustering of Web Page Search based on Web Content Mining Techniques Ricardo Campos 1, 2 Gaël Dias 2 Célia Nunes 2 1 Instituto Politécnico de Tomar Tomar, Portugal 2 Centre of Human
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationSo today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
More informationTime-Aware Exploratory Search: Exploring Word Meaning through Time
Time-Aware Exploratory Search: Exploring Word Meaning through Time Daan Odijk ISLA, University of Amsterdam Giuseppe Santucci Sapienza, University of Rome Maarten de Rijke ISLA, University of Amsterdam
More informationAutomatic Timeline Construction For Computer Forensics Purposes
Automatic Timeline Construction For Computer Forensics Purposes Yoan Chabot, Aurélie Bertaux, Christophe Nicolle and Tahar Kechadi CheckSem Team, Laboratoire Le2i, UMR CNRS 6306 Faculté des sciences Mirande,
More informationSearch Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
More informationData and Analysis. Informatics 1 School of Informatics, University of Edinburgh. Part III Unstructured Data. Ian Stark. Staff-Student Liaison Meeting
Inf1-DA 2010 2011 III: 1 / 89 Informatics 1 School of Informatics, University of Edinburgh Data and Analysis Part III Unstructured Data Ian Stark February 2011 Inf1-DA 2010 2011 III: 2 / 89 Part III Unstructured
More informationPSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
More informationData Mining in Web Search Engine Optimization and User Assisted Rank Results
Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management
More informationContent Marketing Integration Workbook
Content Marketing Integration Workbook 730 Yale Avenue Swarthmore, PA 19081 www.raabassociatesinc.com info@raabassociatesinc.com Introduction Like the Molière character who is delighted to learn he has
More informationTRADE & INDUSTRIAL POLICY STRATEGIES. Simple Download Guide for UN Comtrade Database
TRADE & INDUSTRIAL POLICY STRATEGIES Simple Download Guide for UN Comtrade Database Date: April 2010 Glossary Term Classification, Explanation Refers to commodity classification systems. UN Comtrade currently
More informationAn Analysis of Factors Used in Search Engine Ranking
An Analysis of Factors Used in Search Engine Ranking Albert Bifet 1 Carlos Castillo 2 Paul-Alexandru Chirita 3 Ingmar Weber 4 1 Technical University of Catalonia 2 University of Chile 3 L3S Research Center
More informationHow To Cluster On A Search Engine
Volume 2, Issue 2, February 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: A REVIEW ON QUERY CLUSTERING
More informationSemantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
More informationKeywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.
Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics
More informationA Genetic Algorithm-Evolved 3D Point Cloud Descriptor
A Genetic Algorithm-Evolved 3D Point Cloud Descriptor Dominik Wȩgrzyn and Luís A. Alexandre IT - Instituto de Telecomunicações Dept. of Computer Science, Univ. Beira Interior, 6200-001 Covilhã, Portugal
More informationArtificial Intelligence and Transactional Law: Automated M&A Due Diligence. By Ben Klaber
Artificial Intelligence and Transactional Law: Automated M&A Due Diligence By Ben Klaber Introduction Largely due to the pervasiveness of electronically stored information (ESI) and search and retrieval
More informationRecorded Future A White Paper on Temporal Analytics
Recorded Future A White Paper on Temporal Analytics Staffan Truvé, Ph.D. Chief Scientiest & Co- Founder, Recorded Future truve@recordedfuture.com Thy letters have transported me beyond This ignorant present,
More informationComputational Advertising Andrei Broder Yahoo! Research. SCECR, May 30, 2009
Computational Advertising Andrei Broder Yahoo! Research SCECR, May 30, 2009 Disclaimers This talk presents the opinions of the author. It does not necessarily reflect the views of Yahoo! Inc or any other
More informationTowards Inferring Web Page Relevance An Eye-Tracking Study
Towards Inferring Web Page Relevance An Eye-Tracking Study 1, iconf2015@gwizdka.com Yinglong Zhang 1, ylzhang@utexas.edu 1 The University of Texas at Austin Abstract We present initial results from a project,
More informationText Mining for Health Care and Medicine. Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk
Text Mining for Health Care and Medicine Sophia Ananiadou Director National Centre for Text Mining www.nactem.ac.uk The Need for Text Mining MEDLINE 2005: ~14M 2009: ~18M Overwhelming information in textual,
More informationNetwork Big Data: Facing and Tackling the Complexities Xiaolong Jin
Network Big Data: Facing and Tackling the Complexities Xiaolong Jin CAS Key Laboratory of Network Data Science & Technology Institute of Computing Technology Chinese Academy of Sciences (CAS) 2015-08-10
More informationThree Methods for ediscovery Document Prioritization:
Three Methods for ediscovery Document Prioritization: Comparing and Contrasting Keyword Search with Concept Based and Support Vector Based "Technology Assisted Review-Predictive Coding" Platforms Tom Groom,
More informationBig Data: Rethinking Text Visualization
Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important
More informationWhy big data? Lessons from a Decade+ Experiment in Big Data
Why big data? Lessons from a Decade+ Experiment in Big Data David Belanger PhD Senior Research Fellow Stevens Institute of Technology dbelange@stevens.edu 1 What Does Big Look Like? 7 Image Source Page:
More informationJournal of Global Research in Computer Science RESEARCH SUPPORT SYSTEMS AS AN EFFECTIVE WEB BASED INFORMATION SYSTEM
Volume 2, No. 5, May 2011 Journal of Global Research in Computer Science REVIEW ARTICLE Available Online at www.jgrcs.info RESEARCH SUPPORT SYSTEMS AS AN EFFECTIVE WEB BASED INFORMATION SYSTEM Sheilini
More informationBig Data Governance Certification Self-Study Kit Bundle
Big Data Governance Certification Bundle This certification bundle provides you with the self-study materials you need to prepare for the exams required to complete the Big Data Governance Certification.
More informationResolving Common Analytical Tasks in Text Databases
Resolving Common Analytical Tasks in Text Databases The work is funded by the Federal Ministry of Economic Affairs and Energy (BMWi) under grant agreement 01MD15010B. Database Systems and Text-based Information
More informationEstimating Twitter User Location Using Social Interactions A Content Based Approach
2011 IEEE International Conference on Privacy, Security, Risk, and Trust, and IEEE International Conference on Social Computing Estimating Twitter User Location Using Social Interactions A Content Based
More informationClick to edit Master title style
Click to edit Master title style UNCLASSIFIED//FOR OFFICIAL USE ONLY Dr. Russell D. Richardson, G2/INSCOM Science Advisor UNCLASSIFIED//FOR OFFICIAL USE ONLY 1 UNCLASSIFIED Semantic Enrichment of the Data
More informationIntroduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
More informationPizza SEO: Effective Web. Effective Web Audit. Effective Web Audit. Copyright 2007+ Pizza SEO Ltd. info@pizzaseo.com http://pizzaseo.
1 Table of Contents 1 (X)HTML Code / CSS Code 1.1 Valid code 1.2 Layout 1.3 CSS & JavaScript 1.4 TITLE element 1.5 META Description element 1.6 Structure of pages 2 Structure of URL addresses 2.1 Friendly
More informationArchitecture of an Ontology-Based Domain- Specific Natural Language Question Answering System
Architecture of an Ontology-Based Domain- Specific Natural Language Question Answering System Athira P. M., Sreeja M. and P. C. Reghuraj Department of Computer Science and Engineering, Government Engineering
More informationPanel ADVCOMP/SEMAPRO. Luc Vouligny, moderator
Panel ADVCOMP/SEMAPRO Luc Vouligny, moderator Computing Challenges with Semantics and Ontology Models Cristovâo D P Sousa Universidade do Porto, Portugal Michel ClauB Technische Universität, Chemnitz,
More informationExperiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
More informationImproving Contextual Suggestions using Open Web Domain Knowledge
Improving Contextual Suggestions using Open Web Domain Knowledge Thaer Samar, 1 Alejandro Bellogín, 2 and Arjen de Vries 1 1 Centrum Wiskunde & Informatica, Amsterdam, The Netherlands 2 Universidad Autónoma
More informationCarbon Dating the Web
Carbon Dating the Web: Estimating the Age of Web Resources Hany M. SalahEldeen & Michael L. Nelson Old Dominion University Department of Computer Science Web Science and Digital Libraries Lab. Hany SalahEldeen
More informationCASE STUDY: SPIRAL16
CASE STUDY: SPIRAL16 The Rise of the Social Consumer: A graphical representation BACKGROUND Spiral16, as the company states, stands apart from other monitoring applications because we work like a search
More informationA Semantic web approach for e-learning platforms
A Semantic web approach for e-learning platforms Miguel B. Alves 1 1 Laboratório de Sistemas de Informação, ESTG-IPVC 4900-348 Viana do Castelo. mba@estg.ipvc.pt Abstract. When lecturers publish contents
More informationWhat to Mine from Big Data? Hang Li Noah s Ark Lab Huawei Technologies
What to Mine from Big Data? Hang Li Noah s Ark Lab Huawei Technologies Big Data Value Two Main Issues in Big Data Mining Agenda Four Principles for What to Mine Stories regarding to Principles Search and
More informationDATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
More informationEVILSEED: A Guided Approach to Finding Malicious Web Pages
+ EVILSEED: A Guided Approach to Finding Malicious Web Pages Presented by: Alaa Hassan Supervised by: Dr. Tom Chothia + Outline Introduction Introducing EVILSEED. EVILSEED Architecture. Effectiveness of
More informationIntegrating REST with RIA-Bus for Efficient Communication and Modularity in Rich Internet Applications
Integrating REST with RIA-Bus for Efficient Communication and Modularity in Rich Internet Applications NR Dissanayake 1#, T Wirasingha 2 and GKA Dias 2 1 University of Colombo School of Computing, Colombo
More informationMobile Discovery for Libraries and Museums. IATUL Conference 2015 Wolfgang Stille University and State Library @ TU Darmstadt 1
Mobile Discovery for Libraries and Museums IATUL Conference 2015 Wolfgang Stille University and State Library @ TU Darmstadt 1 Motivation IATUL Conference 2015 Wolfgang Stille University and State Library
More informationDynamics of Genre and Domain Intents
Dynamics of Genre and Domain Intents Shanu Sushmita, Benjamin Piwowarski, and Mounia Lalmas University of Glasgow {shanu,bpiwowar,mounia}@dcs.gla.ac.uk Abstract. As the type of content available on the
More informationSPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
More informationInformation Need Assessment in Information Retrieval
Information Need Assessment in Information Retrieval Beyond Lists and Queries Frank Wissbrock Department of Computer Science Paderborn University, Germany frankw@upb.de Abstract. The goal of every information
More informationChapter 6. Attracting Buyers with Search, Semantic, and Recommendation Technology
Attracting Buyers with Search, Semantic, and Recommendation Technology Learning Objectives Using Search Technology for Business Success Organic Search and Search Engine Optimization Recommendation Engines
More informationTEMPER : A Temporal Relevance Feedback Method
TEMPER : A Temporal Relevance Feedback Method Mostafa Keikha, Shima Gerani and Fabio Crestani {mostafa.keikha, shima.gerani, fabio.crestani}@usi.ch University of Lugano, Lugano, Switzerland Abstract. The
More informationComplexity and Scalability in Semantic Graph Analysis Semantic Days 2013
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation
More informationProfile Based Personalized Web Search and Download Blocker
Profile Based Personalized Web Search and Download Blocker 1 K.Sheeba, 2 G.Kalaiarasi Dhanalakshmi Srinivasan College of Engineering and Technology, Mamallapuram, Chennai, Tamil nadu, India Email: 1 sheebaoec@gmail.com,
More informationRecommendations on Web Page Using Domain Knowledge and Web Usage Mining For Personalization
Recommendations on Web Page Using Domain Knowledge and Web Usage Mining For Personalization Yagnasri Ashwini PG Schloar, Department of Computer Science and Information Technology, Aurora College of Technological
More informationISSN: 2348 9510. A Review: Image Retrieval Using Web Multimedia Mining
A Review: Image Retrieval Using Web Multimedia Satish Bansal*, K K Yadav** *, **Assistant Professor Prestige Institute Of Management, Gwalior (MP), India Abstract Multimedia object include audio, video,
More informationLDAP andUsers Profile - A Quick Comparison
Using LDAP in a Filtering Service for a Digital Library João Ferreira (**) José Luis Borbinha (*) INESC Instituto de Enghenharia de Sistemas e Computatores José Delgado (*) INESC Instituto de Enghenharia
More informationData Warehouses in the Path from Databases to Archives
Data Warehouses in the Path from Databases to Archives Gabriel David FEUP / INESC-Porto This position paper describes a research idea submitted for funding at the Portuguese Research Agency. Introduction
More informationWeb-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy
The Deep Web: Surfacing Hidden Value Michael K. Bergman Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy Presented by Mat Kelly CS895 Web-based Information Retrieval
More informationBig Data The Next Phase Lessons from a Decade+ Experiment in Big Data
Big Data The Next Phase Lessons from a Decade+ Experiment in Big Data David Belanger PhD Senior Research Fellow Stevens Institute of Technology dbelange@stevens.edu 1 Outline Big Data Overview Thinking
More informationBest Practices for Hadoop Data Analysis with Tableau
Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks
More informationToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database
ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database Dina Vishnyakova 1,2, 4, *, Julien Gobeill 1,3,4, Emilie Pasche 1,2,3,4 and Patrick Ruch
More informationWHAT'S NEW IN SHAREPOINT 2013 WEB CONTENT MANAGEMENT
CHAPTER 1 WHAT'S NEW IN SHAREPOINT 2013 WEB CONTENT MANAGEMENT SharePoint 2013 introduces new and improved features for web content management that simplify how we design Internet sites and enhance the
More informationReport on the Dagstuhl Seminar Data Quality on the Web
Report on the Dagstuhl Seminar Data Quality on the Web Michael Gertz M. Tamer Özsu Gunter Saake Kai-Uwe Sattler U of California at Davis, U.S.A. U of Waterloo, Canada U of Magdeburg, Germany TU Ilmenau,
More informationMEASURING GLOBAL ATTENTION: HOW THE APPINIONS PATENTED ALGORITHMS ARE REVOLUTIONIZING INFLUENCE ANALYTICS
WHITE PAPER MEASURING GLOBAL ATTENTION: HOW THE APPINIONS PATENTED ALGORITHMS ARE REVOLUTIONIZING INFLUENCE ANALYTICS Overview There are many associations that come to mind when people hear the word, influence.
More information1 o Semestre 2007/2008
Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction
More informationKaspersky Whitelisting Database Test
Kaspersky Whitelisting Database Test A test commissioned by Kaspersky Lab and performed by AV-Test GmbH Date of the report: February 14 th, 2013, last update: April 4 th, 2013 Summary During November 2012
More informationTaxonomies in Practice Welcome to the second decade of online taxonomy construction
Building a Taxonomy for Auto-classification by Wendi Pohs EDITOR S SUMMARY Taxonomies have expanded from browsing aids to the foundation for automatic classification. Early auto-classification methods
More informationAutomatic Text Processing: Cross-Lingual. Text Categorization
Automatic Text Processing: Cross-Lingual Text Categorization Dipartimento di Ingegneria dell Informazione Università degli Studi di Siena Dottorato di Ricerca in Ingegneria dell Informazone XVII ciclo
More informationRecommender Systems: Content-based, Knowledge-based, Hybrid. Radek Pelánek
Recommender Systems: Content-based, Knowledge-based, Hybrid Radek Pelánek 2015 Today lecture, basic principles: content-based knowledge-based hybrid, choice of approach,... critiquing, explanations,...
More informationRecommendations in Mobile Environments. Professor Hui Xiong Rutgers Business School Rutgers University. Rutgers, the State University of New Jersey
1 Recommendations in Mobile Environments Professor Hui Xiong Rutgers Business School Rutgers University ADMA-2014 Rutgers, the State University of New Jersey Big Data 3 Big Data Application Requirements
More informationCYBER SCIENCE 2015 AN ANALYSIS OF NETWORK TRAFFIC CLASSIFICATION FOR BOTNET DETECTION
CYBER SCIENCE 2015 AN ANALYSIS OF NETWORK TRAFFIC CLASSIFICATION FOR BOTNET DETECTION MATIJA STEVANOVIC PhD Student JENS MYRUP PEDERSEN Associate Professor Department of Electronic Systems Aalborg University,
More informationHOW TO SAVE AND FILE LOTUS NOTES EMAILS
Email messages that are university records should be filed and retained with other records to which they relate. Saving emails to a unit s shared drive is an effective way to extract them from the email
More informationData Integration and ETL Process
Data Integration and ETL Process Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, second
More informationImproving Web Page Retrieval using Search Context from Clicked Domain Names
Improving Web Page Retrieval using Search Context from Clicked Domain Names Rongmei Li School of Electrical, Mathematics, and Computer Science University of Twente P.O.Box 217, 7500 AE, Enschede, the Netherlands
More informationSearch Engine Optimization
Module Presenter s Manual Search Engine Optimization Effective from: April 2015 Ver. 1.0 Presenter s Manual Aptech Limited Page 1 Amendment Record Version No. Effective Date Change Replaced Pages 1.0 April
More informationBig Data Challenges for Information Retrieval
UNIVERSITY OF COPENHAGEN DEPARTMENT OF COMPUTER SCIENCE Faculty of Science Big Data Challenges for Information Retrieval Christina Lioma Department of Computer Science c.lioma@diku.dk Slide 1/8 Information
More informationMining Text Data: An Introduction
Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo
More informationKnowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
More informationIPTV Recommender Systems. Paolo Cremonesi
IPTV Recommender Systems Paolo Cremonesi Agenda 2 IPTV architecture Recommender algorithms Evaluation of different algorithms Multi-model systems Valentino Rossi 3 IPTV architecture 4 Live TV Set-top-box
More informationText Analytics with Ambiverse. Text to Knowledge. www.ambiverse.com
Text Analytics with Ambiverse Text to Knowledge www.ambiverse.com Version 1.0, February 2016 WWW.AMBIVERSE.COM Contents 1 Ambiverse: Text to Knowledge............................... 5 1.1 Text is all Around
More informationVisualization methods for patent data
Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes
More informationA MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS
A MACHINE LEARNING APPROACH TO FILTER UNWANTED MESSAGES FROM ONLINE SOCIAL NETWORKS Charanma.P 1, P. Ganesh Kumar 2, 1 PG Scholar, 2 Assistant Professor,Department of Information Technology, Anna University
More informationLinear programming approach for online advertising
Linear programming approach for online advertising Igor Trajkovski Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, Rugjer Boshkovikj 16, P.O. Box 393, 1000 Skopje,
More informationTask 3 Web Community Sensing & Task 6 Query and Visualization
Task 3 Web Community Sensing & Task 6 Query and Visualization REACTION Workshop January 31 th, 2013 Summary of on-going activities Team update WP3 & WP6 progress reports Resources & publications Team update
More informationExploiting Online Discussions As a Tool For Database Development Project
Exploiting Online Discussions in Collaborative Distributed Requirements Engineering Itzel Morales-Ramirez, Matthieu Vergne, Mirko Morandini, Anna Perini, and Angelo Susi Fondazione Bruno Kessler Via Sommarive
More informationRake: Semantics Assisted Networkbased Tracing Framework
Rake: Semantics Assisted Networkbased Tracing Framework Yan Chen Lab for Internet and Security Technology (LIST) Northwestern Univ. Joint work with Yao Zhao, Yinzhi Cao, Anup Goyal (NU), and Ming Zhang
More informationBUILDING A HOLISTIC MARKETING STRATEGY
Introduction To Integrated Marketing: BUILDING A HOLISTIC MARKETING STRATEGY Email Social Media Online Events Blogs Web S ite Intelligence Landing Pages Integrated Analytics Many B2B marketers invest fortunes
More informationAn Introduction to Machine Learning and Natural Language Processing Tools
An Introduction to Machine Learning and Natural Language Processing Tools Presented by: Mark Sammons, Vivek Srikumar (Many slides courtesy of Nick Rizzolo) 8/24/2010-8/26/2010 Some reasonably reliable
More informationHow To Write An Inspire Directive
INSPIRE Infrastructure for Spatial Information in Europe Detailed definitions on the INSPIRE Network Services Title Detailed definitions on the INSPIRE Network Services Creator Date 2005-07-22 Subject
More informationFOR IMMEDIATE RELEASE
FOR IMMEDIATE RELEASE Hitachi Developed Basic Artificial Intelligence Technology that Enables Logical Dialogue Analyzes huge volumes of text data on issues under debate, and presents reasons and grounds
More informationChapter 8. Final Results on Dutch Senseval-2 Test Data
Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised
More informationONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU
ONTOLOGIES p. 1/40 ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU Unlocking the Secrets of the Past: Text Mining for Historical Documents Blockseminar, 21.2.-11.3.2011 ONTOLOGIES
More informationUTILIZING COMPOUND TERM PROCESSING TO ADDRESS RECORDS MANAGEMENT CHALLENGES
UTILIZING COMPOUND TERM PROCESSING TO ADDRESS RECORDS MANAGEMENT CHALLENGES CONCEPT SEARCHING This document discusses some of the inherent challenges in implementing and maintaining a sound records management
More informationStudy Guide #2 for MKTG 469 Advertising Types of online advertising:
Study Guide #2 for MKTG 469 Advertising Types of online advertising: Display (banner) ads, Search ads Paid search, Ads on social networks, Mobile ads Direct response is growing faster, Not all ads are
More informationMedical Information-Retrieval Systems. Dong Peng Medical Informatics Group
Medical Information-Retrieval Systems Dong Peng Medical Informatics Group Outline Evolution of medical Information-Retrieval (IR). The information retrieval process. The trend of medical information retrieval
More informationOn the Fly Query Segmentation Using Snippets
On the Fly Query Segmentation Using Snippets David J. Brenes 1, Daniel Gayo-Avello 2 and Rodrigo Garcia 3 1 Simplelogica S.L. david.brenes@simplelogica.net 2 University of Oviedo dani@uniovi.es 3 University
More informationWikipedia and Web document based Query Translation and Expansion for Cross-language IR
Wikipedia and Web document based Query Translation and Expansion for Cross-language IR Ling-Xiang Tang 1, Andrew Trotman 2, Shlomo Geva 1, Yue Xu 1 1Faculty of Science and Technology, Queensland University
More informationPersonalization of Web Search With Protected Privacy
Personalization of Web Search With Protected Privacy S.S DIVYA, R.RUBINI,P.EZHIL Final year, Information Technology,KarpagaVinayaga College Engineering and Technology, Kanchipuram [D.t] Final year, Information
More informationClustering Technique in Data Mining for Text Documents
Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor
More informationPerformance Analysis, Data Sharing, Tools Integration: New Approach based on Ontology
Performance Analysis, Data Sharing, Tools Integration: New Approach based on Ontology Hong-Linh Truong Institute for Software Science, University of Vienna, Austria truong@par.univie.ac.at Thomas Fahringer
More information