Archivage du contenu éphémère du Web à l aide des flux Web *
|
|
|
- Madlyn Bennett
- 10 years ago
- Views:
Transcription
1 Archivage du contenu éphémère du Web à l aide des flux Web * Marilena Oita Pierre Senellart Résumé Cette proposition de démonstration concerne une application d archivage du contenu du Web à l aide des flux Web. A partir de la spécification d un domaine par l utilisateur, des services spécialisés sont utilisés pour acquérir des flux pertinents. Pour chacun de ces flux, on exploite les indices sémantiques attachés à un objet dynamique pour extraire, à partir de la page Web associée, les données qui correspondent à la description. On ajoute à cet objet des méta-données supplémentaires et l estampille temporelle, on extrait le template de la page, et on garde ces composants indépendamment pour être prêts à répondre à des requêtes temporelles et sémantiques et, à la demande, reconstruire la page Web référencée par le flux. Les méthodes pour détecter le changement de la page Web sont également utiles dans le cadre d un crawl incrémental des versions du même objet dynamique. Mots-clefs : archivage du Web, flux Web, objet dynamique. 1 Introduction Innovation is constantly created on the Web through the collective intelligence of its contributors. Besides the abundance of information, we nowadays face the difficulty to keep track of updated information, as the Web is changing faster than our faculty to perceive it. For this reason, new tools are made available on the Internet in order to remain current with real-time or frequently changing data. To avoid consuming time on information gathering, a generally impacting idea was to bring content from different sources in one place, using streaming techniques. Content is made available by their publishers much easier than before, and users have only to subscribe to it, stating in this way their interest. The syndicated content has one special characteristic: it is intensively dynamic, meaning that new information is added or removed at speed rate. From news articles, blog posts, wiki entries, dictionary terms, forums entries, to any kind of event information that can be versioned, timely content is in expansion on the Web. The lifespan of the corresponding Web pages becomes much smaller than the average [1], ranging from minutes to days. The reason why we call this kind of content ephemeral is the following: when a crawler is not aware of the * This work has been partially funded by the European Research Council under the European Community s Seventh Framework Programme (FP7/ ) / ERC grant Webdam, agreement INRIA Saclay Île-de-France & Télécom ParisTech; CNRS LTCI. 46 rue Barrault, Paris Cedex 13, [email protected] Institut Télécom; Télécom ParisTech; CNRS LTCI. 46 rue Barrault, Paris Cedex 13, [email protected] 1
2 fresh content published, and its frequency of crawl is smaller than the content s rate of change, the respective data is lost and therefore not indexable, resulting in missing snapshots and incoherent archives. In the Web 2.0 era, the Internet user can easily be creator of Web content, but services provided by particular sites are used in order to publish it. Moving beyond the present and into the future, the user might need to archive the published content for independent use in offline settings. Provided with a system that archives this kind of data continuously for a period of time, the user should be able to temporally and semantically query the content, and retrieve useful responses. Due to all implied challenges, ephemeral data is crawled sporadically, in an incomplete and discontinuous manner. Timely content is going to be lost if not correctly archived, and even if some information might be found and regrouped, it will be lossy, inaccessible, or fee-based. In the current status quo the information is stored in collections of Web pages and explored in navigational way. Hence, there is no natural way for the user to easily observe different aspects of the data and uniformly query it. In this demonstration, we try to address these problems. The presentation is as follows: the next section gives some insights into the related work, while Section 3 explains how we acquire and utilize Web feeds to extract timely information; Section 4 reveals the importance of change detection when dealing with ephemeral content and how we can leverage the temporal and semantic aspects of the feeds to make the crawl process more coherent. A demonstration scenario and the general architecture of our system are presented in Section 5. 2 Related Work Web archiving, seen as a necessity or duty, is acquiring a lot of importance lately due to the volatile nature of the Web, and more specifically due to the value that the lost information could have for future generations. ArchivePress, a blog-archiving project [2], is using Web feeds in order to archive pages that contain data about particular institutional activity. The principal drawback of this approach is the fact that the constructed Wordpress plug-in captures only the content that can be delivered by the Web feed. In effect, a feed can have a full coverage of the articles and their media files, but this case is quite rare: a feed is mainly just a way to advertise content. In contrast, we harness the feed s clues with the sole goal to make a better distinction between Web page components, and to extract the resulting data objects. Additionally, we do not limit ourselves to a blogging platform in particular, but aim to create a more general type of personal crawler. The preferred method for crawling is to perform snapshots periodically. While this principle is effective for relatively static pages, performing it in the case of pages for which the content changes frequently and in a heterogeneous way is leading to many problems of interpretability and temporal coherence [3]. Instead of recrawling an entire website at distant intervals of time, an incremental crawl [1] is meant to adapt its frequency of crawl to the dynamics of Web pages. This adaptation can be done only if we can determine when individual Web pages of a target site are added, updated or deleted. In absence of a prior knowledge about this change event, heuristics are commonly used [4]. A more formal model of change prediction is studied in [1]; here the authors try to determine whether changes in a Web page follow a Poisson process. However, accurately identifying the change rate of Web pages is stated by both approaches as an open problem. It is very difficult to fairly estimate the change rate when one consideres the Web page in its totality. Typically, the boilerplate that surrounds the article (dynamic ads or widgets in particular) changes the state of the DOM tree of the Web page, and, even if the content of interest remains the same, the Web page is seen by a naïve archiving crawler as changed. Trying to filter this boilerplate can be risky, because we 2
3 might not manage to cover all kind of structuring techniques used. Instead, we extract only the element of interest (generally called data object) and detect content change at this finer-grained level. Extracting structured data from Web pages has been intensively studied. However, none of the existing works takes into account the semantic of the content in order to extract it, but instead harness patterns or code regularities at DOM level. Our approach is somehow similar to [5] in our way of detecting the parent that has the largest number of children that satisfy a certain condition. In contrast, the condition to be satisfied for us is not the similarity of encoding at data-record level, but instead a semantic measure at the level of conceptual nodes. This semantic measure is evaluated by matching concepts (from an initial bag of semantic clues given by the feed) against DOM node elements of the Web page in which the identified data object resides. We have compared the article s text content returned by our technique with the result of Boilerpipe [6]. Boilerpipe extracts articles from Web pages using the text density ratios of subsequent blocks. There are situations in which this technique fails either because the text of the article is too small and a denser portion of the page is identified, or because there are various articles in the page, and no semantic distinction between them can be done: all of them are taken as one. By exploiting the semantics of a Web feed, we can obtain more precise results. However, our method works only for Web pages that have a feed associated. 3 Capturing Data Objects To help readers keep current with dynamic content, publishers add feed capabilities to their dynamic resources. We use the concept of data object to refer to a news article, blog post, dictionary entry, a forum message, a status, or any other kind of resource that is uniquely referenced by a Web feed item. In order to address the problem of the abundance of feed sources, we pass through a discovery phase that is also performing a semantic filtering. We use a specialized service like Search4RSS metasearch engine to acquire a number of Web feeds in a specific domain, the scope of which is specified by a user with keywords. Search4RSS covers all kind of Web pages that have a feed associated, in RSS, Atom or Rdf formats. We begin by identifying the data object that corresponds at feed level with an item of a channel. The item s URL references the Web page that will be cleaned for further processing using HtmlCleaner. We use the hints provided by the feed item s title and description to construct a bag of descriptive terms and n-grams. This semantic information about the data object that we are looking to extract will be used to identify the DOM node element that acts like a wrapper for the text and references of the respective Web article. Finally, the data object will be set with the extracted text and references, and timestamped with the item s publication date. The concepts retrieved from the feed item by means of data mining will be associated to the object in order for it to be semantically queried. Similarly, the object s timestamp will be exploited for the temporal queries. We separately crawl other semantic components of the page and the remaining template. References to these are retained as extrinsic attributes of the data object, with the goal of reconstructing at need the Web page, or to make some kind of relational analysis. As well, we enrich the current semantic context of the data object with a related ontological one. The detection of tag clouds and categories of the Web article can be done using special heuristics, but when this metadata is missing we make use of the YAGO ontology [7] to retrieve instances and concepts related to the article s semantics. 3
4 4 Change Detection While in this demonstration the main aim is to add exploitable semantics to Web archiving collections, the feeds and in particular the data object extraction technique can also be used to improve the Web page change detection, an essential step in the context of an incremental crawl. Assuming that the website to be crawl has a feed associated, a temporal analysis of the feed can be done periodically. The resulting website s publishing strategy can be leveraged to automatically get the feed files with an adequated frequency and use them as an instrument for change detection. Useful timestamps can be extracted from a Web feed, like the pubdate RSS element or updated Atom element. These temporal hints give the dates of publication of the referenced Web articles and from our experiments are, with very few exceptions, omnipresent in a feed. An incremental crawl can therefore use them as change detection signals in order to crawl new pages of a particular site. Basically, the feed s channel presents a dynamic part of the website it refers to. Hence, in theory and by excluding boilerplate, other pages of the site are relatively static and a periodic snapshot can capture their content without losing information. For pages discovered through the feed and which correspond to feed items, we can empirically set the frequency of change by detecting the modifications at data object level. 5 Scenario To better illustrate the motivation, we can take the example of an ecologist who wants to study the evolution of a natural process and its parameters of influence. She knows that there exists on the Web services that can provide data of interest in a timely manner. Her interest is consistent in time: she wants not only to get informed for a period of time about the states of the same process, to get different points of view of the same event, but also to be able to make some analysis at the end of a specific cycle. The ecologist wants also to browse the information of interest offline, and to query it using temporal attributes and keywords. The actor in this scenario could simply use the feeds provided by webmasters to get informed of new Web pages of interest, but actually she will have no ability to manage and query particular data from those Web pages in a uniform manner. Our system provides the environment and atomic data necessary to this process, as it is shown in the following figure. streamflow RSS metasearch engine feeds feed analyzer timestamps metadata annotations objects partial data Web Web page Web page components store in time User Interface Collection enter date keywords Web page reconstruction of the selected item rendering of relevant items records Wpage 4
5 Given the user s domain of interest, we query a Search4RSS metasearch engine and accumulate a number of feeds as references to Web pages that provide timely content. Partial data in the above figure represent the resource s title and description. By having quite a clue about the content that is advertised through the feed s items (considered therefore content of interest), we are able to identify in the Web page the data objects. The user has the possibility to follow a channel and the potentially many versions of the same data object, or to search for other sources on the same topic (reiterating the request to Search4RSS). As well, the user can run semantic and temporal queries and the system will provide her the relevant data objects in the form of references to their authentic, initial form: the snapshot Web page (or a cleaned version of it). As a future improvement, the system can use users preferences to render the information, more specifically to mix the objects or present them in different contexts. 6 Conclusions and Observations Ephemeral content is the object of many timely sources on the Web today. Its value is stated by the existence of Web feeds. An incremental crawl is adapted to this kind of content, but when the Web page changes too rapidly for the crawler to detect the change, some pieces of the puzzle that represent content in time will be missing. The demonstrated system presents a new way of archiving dynamic data, an approach based on feed technology. In essence, we exploit the hints provided by the Web feeds to encapsulate timely content together with metadata into a self-contained timely unit. This will represent an instance in the chain of updates of the same Web resource and also the diff element necessary in the context of an incremental crawl. The goal of our application is to add exploitable meaning to Web archiving collections created in a more complete and coherent manner. References [1] Junghoo Cho and Hector Garcia-Molina. The evolution of the Web and implications for an incremental crawler. In Proc. VLDB, Cairo, Egypt, September [2] Maureen Pennock and Richard Davis. ArchivePress: A really simple solution to archiving blog content. In Proc. ipres, San Francisco, USA, [3] Marc Spaniol, Dimitar Denev, Arturas Mazeika, and Gerhard Weikum. Catch me if you can. Temporal coherence of Web archives. In Proc. IWAW, Aarhus, Denmark, September [4] Kristinn Sigurðsson. Incremental crawling with Heritrix. In Proc. IWAW, Vienna, Austria, September [5] Bing Liu, Robert Grossman, and Yanhong Zhai. Mining data records in Web pages. In Proc. KDD, Washington, USA, August [6] Christian Kholschutter, Peter Fankhauser, and Wolfgang Nejdi. Boilerplate detection using shallow text features. In Proc. WSDM, New York, USA, February [7] Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. YAGO: A core of semantic knowledge unifying WordNet and Wikipedia. In Proc. WWW, Banff, Canada, May
Archiving Data Objects using Web Feeds
Archiving Data Objects using Web Feeds Marilena Oita INRIA Saclay Île-de-France & Télécom ParisTech; CNRS LTCI Paris, France [email protected] Pierre Senellart Institut Télécom Télécom
Web Analytics Definitions Approved August 16, 2007
Web Analytics Definitions Approved August 16, 2007 Web Analytics Association 2300 M Street, Suite 800 Washington DC 20037 [email protected] 1-800-349-1070 Licensed under a Creative
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.
So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
Deriving Dynamics of Web Pages: A Survey *
Deriving Dynamics of Web Pages: A Survey * Marilena Oita INRIA Saclay Île-de-France & Télécom ParisTech; CNRS LTCI Paris, France [email protected] Pierre Senellart Institut Télécom Télécom
Scholarly Use of Web Archives
Scholarly Use of Web Archives Helen Hockx-Yu Head of Web Archiving British Library 15 February 2013 Web Archiving initiatives worldwide http://en.wikipedia.org/wiki/file:map_of_web_archiving_initiatives_worldwide.png
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
EVILSEED: A Guided Approach to Finding Malicious Web Pages
+ EVILSEED: A Guided Approach to Finding Malicious Web Pages Presented by: Alaa Hassan Supervised by: Dr. Tom Chothia + Outline Introduction Introducing EVILSEED. EVILSEED Architecture. Effectiveness of
Weblogs Content Classification Tools: performance evaluation
Weblogs Content Classification Tools: performance evaluation Jesús Tramullas a,, and Piedad Garrido a a Universidad de Zaragoza, Dept. of Library and Information Science. Pedro Cerbuna 12, 50009 [email protected]
Leveraging existing Web frameworks for a SIOC explorer to browse online social communities
Leveraging existing Web frameworks for a SIOC explorer to browse online social communities Benjamin Heitmann and Eyal Oren Digital Enterprise Research Institute National University of Ireland, Galway Galway,
K@ A collaborative platform for knowledge management
White Paper K@ A collaborative platform for knowledge management Quinary SpA www.quinary.com via Pietrasanta 14 20141 Milano Italia t +39 02 3090 1500 f +39 02 3090 1501 Copyright 2004 Quinary SpA Index
Il est repris ci-dessous sans aucune complétude - quelques éléments de cet article, dont il est fait des citations (texte entre guillemets).
Modélisation déclarative et sémantique, ontologies, assemblage et intégration de modèles, génération de code Declarative and semantic modelling, ontologies, model linking and integration, code generation
Semantic Search in Portals using Ontologies
Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br
Get results with modern, personalized digital experiences
Brochure HP TeamSite What s new in TeamSite? The latest release of TeamSite (TeamSite 8) brings significant enhancements in usability and performance: Modern graphical interface: Rely on an easy and intuitive
City Data Pipeline. A System for Making Open Data Useful for Cities. [email protected]
City Data Pipeline A System for Making Open Data Useful for Cities Stefan Bischof 1,2, Axel Polleres 1, and Simon Sperl 1 1 Siemens AG Österreich, Siemensstraße 90, 1211 Vienna, Austria {bischof.stefan,axel.polleres,simon.sperl}@siemens.com
How To Make Sense Of Data With Altilia
HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to
Standards, Tools and Web 2.0
Standards, Tools and Web 2.0 Web Programming Uta Priss ZELL, Ostfalia University 2013 Web Programming Standards and Tools Slide 1/31 Outline Guidelines and Tests Logfile analysis W3C Standards Tools Web
Social Media Measurement Meeting Robert Wood Johnson Foundation April 25, 2013 SOCIAL MEDIA MONITORING TOOLS
Social Media Measurement Meeting Robert Wood Johnson Foundation April 25, 2013 SOCIAL MEDIA MONITORING TOOLS This resource provides a sampling of tools available to produce social media metrics. The list
Data Mining in Web Search Engine Optimization and User Assisted Rank Results
Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management
Semantically Enhanced Web Personalization Approaches and Techniques
Semantically Enhanced Web Personalization Approaches and Techniques Dario Vuljani, Lidia Rovan, Mirta Baranovi Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, HR-10000 Zagreb,
HOW TO DO A SMART DATA PROJECT
April 2014 Smart Data Strategies HOW TO DO A SMART DATA PROJECT Guideline www.altiliagroup.com Summary ALTILIA s approach to Smart Data PROJECTS 3 1. BUSINESS USE CASE DEFINITION 4 2. PROJECT PLANNING
Online Reputation Management Services
Online Reputation Management Services Potential customers change purchase decisions when they see bad reviews, posts and comments online which can spread in various channels such as in search engine results
ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU
ONTOLOGIES p. 1/40 ONTOLOGIES A short tutorial with references to YAGO Cosmina CROITORU Unlocking the Secrets of the Past: Text Mining for Historical Documents Blockseminar, 21.2.-11.3.2011 ONTOLOGIES
Web Database Integration
Web Database Integration Wei Liu School of Information Renmin University of China Beijing, 100872, China [email protected] Xiaofeng Meng School of Information Renmin University of China Beijing, 100872,
Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object
Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object Anne Monceaux 1, Joanna Guss 1 1 EADS-CCR, Centreda 1, 4 Avenue Didier Daurat 31700 Blagnac France
WEB CONTENT MANAGEMENT PATTERNS
WEB CONTENT MANAGEMENT PATTERNS By YOAB GORFU RENEL SMITH ABE GUERRA KAY ODEYEMI May 2006 Version Date: 23-Apr-06-06:04 PM - 1 - Story TABLE OF CONTENTS Pattern: Separate it. How do you ensure the site
Terminology Extraction from Log Files
Terminology Extraction from Log Files Hassan Saneifar, Stéphane Bonniol, Anne Laurent, Pascal Poncelet, Mathieu Roche To cite this version: Hassan Saneifar, Stéphane Bonniol, Anne Laurent, Pascal Poncelet,
Archiving the Web: the mass preservation challenge
Archiving the Web: the mass preservation challenge Catherine Lupovici Chargée de Mission auprès du Directeur des Services et des Réseaux Bibliothèque nationale de France 1-, Koninklijke Bibliotheek, Den
Managing the Knowledge Exchange between the Partners of the Supply Chain
Managing the Exchange between the Partners of the Supply Chain Problem : How to help the SC s to formalize the exchange of the? Which methodology of exchange? Which representation formalisms? Which technical
Coveo Platform 7.0. Microsoft Active Directory Connector Guide
Coveo Platform 7.0 Microsoft Active Directory Connector Guide Notice The content in this document represents the current view of Coveo as of the date of publication. Because Coveo continually responds
LinkZoo: A linked data platform for collaborative management of heterogeneous resources
LinkZoo: A linked data platform for collaborative management of heterogeneous resources Marios Meimaris, George Alexiou, George Papastefanatos Institute for the Management of Information Systems, Research
Website Marketing Audit. Example, inc. Website Marketing Audit. For. Example, INC. Provided by
Website Marketing Audit For Example, INC Provided by State of your Website Strengths We found the website to be easy to navigate and does not contain any broken links. The structure of the website is clean
LEHMAN COLLEGE/CUNY. Social Media Guidelines Web Policy and Content Committee Approved by Lehman Cabinet, 6/11/12
LEHMAN COLLEGE/CUNY Social Media Guidelines Web Policy and Content Committee Approved by Lehman Cabinet, 6/11/12 I. Purpose of these Guidelines. The Web Content and Policy Committee has developed this
Don t scan, just ask A new approach of identifying vulnerable web applications. 28th Chaos Communication Congress, 12/28/11 - Berlin
Don t scan, just ask A new approach of identifying vulnerable web applications Summary It s about identifying web applications and systems Classical network reconnaissance techniques mostly rely on technical
Archived Content. Contenu archivé
ARCHIVED - Archiving Content ARCHIVÉE - Contenu archivé Archived Content Contenu archivé Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject
An In-Context and Collaborative Software Localisation Model: Demonstration
An In-Context and Collaborative Software Localisation Model: Demonstration Amel FRAISSE Christian BOITET Valérie BELLYNCK LABORATOIRE LIG, Université Joseph Fourier, 41 rue des Mathématiques, 38041 Grenoble,
Intinno: A Web Integrated Digital Library and Learning Content Management System
Intinno: A Web Integrated Digital Library and Learning Content Management System Synopsis of the Thesis to be submitted in Partial Fulfillment of the Requirements for the Award of the Degree of Master
MEASURING THE SIZE OF SMALL FUNCTIONAL ENHANCEMENTS TO SOFTWARE
MEASURING THE SIZE OF SMALL FUNCTIONAL ENHANCEMENTS TO SOFTWARE Marcela Maya, Alain Abran, Pierre Bourque Université du Québec à Montréal P.O. Box 8888 (Centre-Ville) Montréal (Québec), Canada H3C 3P8
Kentico CMS 7.0 Intranet Administrator's Guide
Kentico CMS 7.0 Intranet Administrator's Guide 2 Kentico CMS 7.0 Intranet Administrator's Guide Table of Contents Introduction 5... 5 About this guide Getting started 7... 7 Installation... 11 Accessing
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY
ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY Yu. A. Zagorulko, O. I. Borovikova, S. V. Bulgakov, E. A. Sidorova 1 A.P.Ershov s Institute
Delivering Smart Answers!
Companion for SharePoint Topic Analyst Companion for SharePoint All Your Information Enterprise-ready Enrich SharePoint, your central place for document and workflow management, not only with an improved
2015 SEO AND Beyond. Enter the Search Engines for Business. www.thinkbigengine.com
2015 SEO AND Beyond Enter the Search Engines for Business www.thinkbigengine.com Including SEO Into Your 2015 Marketing Campaign SEO in 2015 is tremendously different than it was just a few years ago.
Chapter-1 : Introduction 1 CHAPTER - 1. Introduction
Chapter-1 : Introduction 1 CHAPTER - 1 Introduction This thesis presents design of a new Model of the Meta-Search Engine for getting optimized search results. The focus is on new dimension of internet
Note concernant votre accord de souscription au service «Trusted Certificate Service» (TCS)
Note concernant votre accord de souscription au service «Trusted Certificate Service» (TCS) Veuillez vérifier les éléments suivants avant de nous soumettre votre accord : 1. Vous avez bien lu et paraphé
M3039 MPEG 97/ January 1998
INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND ASSOCIATED AUDIO INFORMATION ISO/IEC JTC1/SC29/WG11 M3039
72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD
72. Ontology Driven Knowledge Discovery Process: a proposal to integrate Ontology Engineering and KDD Paulo Gottgtroy Auckland University of Technology [email protected] Abstract This paper is
WHITE PAPER. Social media analytics in the insurance industry
WHITE PAPER Social media analytics in the insurance industry Introduction Insurance is a high involvement product, as it is an expense. Consumers obtain information about insurance from advertisements,
Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens
Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens 1 Optique: Improving the competitiveness of European industry For many
technische universiteit eindhoven WIS & Engineering Geert-Jan Houben
WIS & Engineering Geert-Jan Houben Contents Web Information System (WIS) Evolution in Web data WIS Engineering Languages for Web data XML (context only!) RDF XML Querying: XQuery (context only!) RDFS SPARQL
Lightweight Data Integration using the WebComposition Data Grid Service
Lightweight Data Integration using the WebComposition Data Grid Service Ralph Sommermeier 1, Andreas Heil 2, Martin Gaedke 1 1 Chemnitz University of Technology, Faculty of Computer Science, Distributed
Importance of Domain Knowledge in Web Recommender Systems
Importance of Domain Knowledge in Web Recommender Systems Saloni Aggarwal Student UIET, Panjab University Chandigarh, India Veenu Mangat Assistant Professor UIET, Panjab University Chandigarh, India ABSTRACT
Coveo Platform 7.0. Oracle Knowledge Connector Guide
Coveo Platform 7.0 Oracle Knowledge Connector Guide Notice The content in this document represents the current view of Coveo as of the date of publication. Because Coveo continually responds to changing
Digital Marketing Training Institute
Our USP Live Training Expert Faculty Personalized Training Post Training Support Trusted Institute 5+ Years Experience Flexible Batches Certified Trainers Digital Marketing Training Institute Mumbai Branch:
DATA QUALITY DATA BASE QUALITY INFORMATION SYSTEM QUALITY
DATA QUALITY DATA BASE QUALITY INFORMATION SYSTEM QUALITY The content of those documents are the exclusive property of REVER. The aim of those documents is to provide information and should, in no case,
Developing Microsoft SharePoint Server 2013 Advanced Solutions MOC 20489
Developing Microsoft SharePoint Server 2013 Advanced Solutions MOC 20489 Course Outline Module 1: Creating Robust and Efficient Apps for SharePoint In this module, you will review key aspects of the apps
Automatic Annotation Wrapper Generation and Mining Web Database Search Result
Automatic Annotation Wrapper Generation and Mining Web Database Search Result V.Yogam 1, K.Umamaheswari 2 1 PG student, ME Software Engineering, Anna University (BIT campus), Trichy, Tamil nadu, India
Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMiner Petar Ristoski, Christian Bizer, and Heiko Paulheim University of Mannheim, Germany Data and Web Science Group {petar.ristoski,heiko,chris}@informatik.uni-mannheim.de
Unlocking The Value of the Deep Web. Harvesting Big Data that Google Doesn t Reach
Unlocking The Value of the Deep Web Harvesting Big Data that Google Doesn t Reach Introduction Every day, untold millions search the web with Google, Bing and other search engines. The volumes truly are
Deriving Business Intelligence from Unstructured Data
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 9 (2013), pp. 971-976 International Research Publications House http://www. irphouse.com /ijict.htm Deriving
Detection of water leakage using laser images from 3D laser scanning data
Detection of water leakage using laser images from 3D laser scanning data QUANHONG FENG 1, GUOJUAN WANG 2 & KENNERT RÖSHOFF 3 1 Berg Bygg Konsult (BBK) AB,Ankdammsgatan 20, SE-171 43, Solna, Sweden (e-mail:[email protected])
Distributed Database for Environmental Data Integration
Distributed Database for Environmental Data Integration A. Amato', V. Di Lecce2, and V. Piuri 3 II Engineering Faculty of Politecnico di Bari - Italy 2 DIASS, Politecnico di Bari, Italy 3Dept Information
INSTITUT FEMTO-ST. Webservices Platform for Tourism (WICHAI)
INSTITUT FEMTO-ST UMR CNRS 6174 Webservices Platform for Tourism (WICHAI) Kitsiri Chochiang Fouad Hanna Marie-Laure Betbeder Jean-Christophe Lapayre Rapport Technique n RTDISC2015-1 DÉPARTEMENT DISC October
Terminology Extraction from Log Files
Terminology Extraction from Log Files Hassan Saneifar 1,2, Stéphane Bonniol 2, Anne Laurent 1, Pascal Poncelet 1, and Mathieu Roche 1 1 LIRMM - Université Montpellier 2 - CNRS 161 rue Ada, 34392 Montpellier
9. Technology in KM. ETL525 Knowledge Management Tutorial Four. 16 January 2009. K.T. Lam [email protected]
9. Technology in KM ETL525 Knowledge Management Tutorial Four 16 January 2009 K.T. Lam [email protected] Last updated: 15 January 2009 Technology is KM Enabler Technology is one of the Four Pillars of KM, which
Taking full advantage of the medium does also mean that publications can be updated and the changes being visible to all online readers immediately.
Making a Home for a Family of Online Journals The Living Reviews Publishing Platform Robert Forkel Heinz Nixdorf Center for Information Management in the Max Planck Society Overview The Family The Concept
Swiss TPH Project Collaboration Platform
Informatics Unit Swiss TPH Project Collaboration Platform Alfresco Document Management System User s Manual September 2011 Version 1.0 Swiss TPH, Socinstrasse 57, P.O. Box, 4002 Basel, Switzerland, T +41
A Survey Study on Monitoring Service for Grid
A Survey Study on Monitoring Service for Grid Erkang You [email protected] ABSTRACT Grid is a distributed system that integrates heterogeneous systems into a single transparent computer, aiming to provide
Introduction au BIM. ESEB 38170 Seyssinet-Pariset Economie de la construction email : [email protected]
Quel est l objectif? 1 La France n est pas le seul pays impliqué 2 Une démarche obligatoire 3 Une organisation plus efficace 4 Le contexte 5 Risque d erreur INTERVENANTS : - Architecte - Économiste - Contrôleur
Building Multilingual Search Index using open source framework
Building Multilingual Search Index using open source framework ABSTRACT Arjun Atreya V 1 Swapnil Chaudhari 1 Pushpak Bhattacharyya 1 Ganesh Ramakrishnan 1 (1) Deptartment of CSE, IIT Bombay {arjun, swapnil,
Web Archiving and Scholarly Use of Web Archives
Web Archiving and Scholarly Use of Web Archives Helen Hockx-Yu Head of Web Archiving British Library 15 April 2013 Overview 1. Introduction 2. Access and usage: UK Web Archive 3. Scholarly feedback on
Collaboration. Michael McCabe Information Architect [email protected]. black and white solutions for a grey world
Collaboration Michael McCabe Information Architect [email protected] black and white solutions for a grey world Slide Deck & Webcast Recording links Questions and Answers We will answer questions at
Email Subscription vs. RSS:
Email Subscription vs. RSS: You Decide Cathy Miller Business Writer/Consultant INTRODUCTION We all have our own preferences in how we do things. That s what makes us unique. When you find a blog that you
SharePoint 2013: Key feature improvements
E-001 SharePoint 2013: Key feature improvements nonlinear enterprise is the enterprise collaboration and communication technology division of nonlinear creations. Relying on our Enterprise Performance
A Semantic Marketplace of Peers Hosting Negotiating Intelligent Agents
A Semantic Marketplace of Peers Hosting Negotiating Intelligent Agents Theodore Patkos and Dimitris Plexousakis Institute of Computer Science, FO.R.T.H. Vassilika Vouton, P.O. Box 1385, GR 71110 Heraklion,
TopBraid Insight for Life Sciences
TopBraid Insight for Life Sciences In the Life Sciences industries, making critical business decisions depends on having relevant information. However, queries often have to span multiple sources of information.
The Integration of Agent Technology and Data Warehouse into Executive Banking Information System (EBIS) Architecture
The Integration of Agent Technology and Data Warehouse into Executive Banking System (EBIS) Architecture Ismail Faculty of Technology and Communication (FTMK) Technical University of Malaysia Melaka (UTeM),75450
Essential New Media Terms
Affiliate Marketing: A popular marketing technique that partners merchant with website in which the merchant compensates the website based on performance (e.g. referrals). Aggregator: Also referred to
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &
HiTech. White Paper. A Next Generation Search System for Today's Digital Enterprises
HiTech White Paper A Next Generation Search System for Today's Digital Enterprises About the Author Ajay Parashar Ajay Parashar is a Solution Architect with the HiTech business unit at Tata Consultancy
Content Management Systems: Drupal Vs Jahia
Content Management Systems: Drupal Vs Jahia Mrudula Talloju Department of Computing and Information Sciences Kansas State University Manhattan, KS 66502. [email protected] Abstract Content Management Systems
Report on the Dagstuhl Seminar Data Quality on the Web
Report on the Dagstuhl Seminar Data Quality on the Web Michael Gertz M. Tamer Özsu Gunter Saake Kai-Uwe Sattler U of California at Davis, U.S.A. U of Waterloo, Canada U of Magdeburg, Germany TU Ilmenau,
Annotea and Semantic Web Supported Collaboration
Annotea and Semantic Web Supported Collaboration Marja-Riitta Koivunen, Ph.D. Annotea project Abstract Like any other technology, the Semantic Web cannot succeed if the applications using it do not serve
TopBraid Life Sciences Insight
TopBraid Life Sciences Insight In the Life Sciences industries, making critical business decisions depends on having relevant information. However, queries often have to span multiple sources of information.
