Service-Based Infrastructure for User-Oriented Environmental Information Delivery



Similar documents
Service Based Infrastructure for User Oriented. The PESCaDO Consortium

Distributed Database for Environmental Data Integration

How To Write A Blog Post On The Web (For Free)

Thomas Usländer Fraunhofer IITB

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November ISSN

CHAPTER 1 INTRODUCTION

Rotorcraft Health Management System (RHMS)

The Scientific Data Mining Process

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

SPATIAL DATA CLASSIFICATION AND DATA MINING

ONTOLOGY-BASED APPROACH TO DEVELOPMENT OF ADJUSTABLE KNOWLEDGE INTERNET PORTAL FOR SUPPORT OF RESEARCH ACTIVITIY

An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials

European Network on New Sensing Technologies for Air Pollution Control and Environmental Sustainability - EuNetAir COST Action TD1105

MOBILITY DATA MODELING AND REPRESENTATION

IFS-8000 V2.0 INFORMATION FUSION SYSTEM

Reusable Knowledge-based Components for Building Software. Applications: A Knowledge Modelling Approach

Automatic Timeline Construction For Computer Forensics Purposes

Knowledge-based service architecture for multi-risk environmental decision support applications Stuart E. Middleton and Zoheir A.

UIMA and WebContent: Complementary Frameworks for Building Semantic Web Applications

The ebbits project: from the Internet of Things to Food Traceability

An Object Model for Business Applications

The Application Method of CRM as Big Data: Focused on the Car Maintenance Industry

Data Discovery on the Information Highway

ORCHESTRA Aktueller Stand und Entwicklungen. Ulrich Bügel Fraunhofer IITB

Business Intelligence and Decision Support Systems

Standardization Requirements Analysis on Big Data in Public Sector based on Potential Business Models

KEY KNOWLEDGE MANAGEMENT TECHNOLOGIES IN THE INTELLIGENCE ENTERPRISE

OntoPIM: How to Rely on a Personal Ontology for Personal Information Management

A Big Picture for Big Data

IBM Unstructured Data Identification and Management

Training Management System for Aircraft Engineering: indexing and retrieval of Corporate Learning Object

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

ACE GIS Project Overview: Adaptable and Composable E-commerce and Geographic Information Services

Web services to allow access for all in dotlrn

Meta-Model specification V2 D

Information Services for Smart Grids

LDIF - Linked Data Integration Framework

Semantic Search in Portals using Ontologies

Smart Cities Data Streams Integration: experimenting with Internet of Things and social data flows

Performance Analysis, Data Sharing, Tools Integration: New Approach based on Ontology

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Information Visualization WS 2013/14 11 Visual Analytics

Remote support for lab activities in educational institutions

move move us Newsletter 2014 Content MoveUs has successfully finished the first year of the project!

The SEEMP project Single European Employment Market-Place An e-government case study

Web-Based Genomic Information Integration with Gene Ontology

Configuring Firewalls An XML-based Approach to Modelling and Implementing Firewall Configurations

HOPS Project presentation

Healthcare Measurement Analysis Using Data mining Techniques

M2M Communications and Internet of Things for Smart Cities. Soumya Kanti Datta Mobile Communications Dept.

A Review of Data Mining Techniques

Building a virtual marketplace for software development tasks

SEVENTH FRAMEWORK PROGRAMME THEME ICT Digital libraries and technology-enhanced learning

Cybersecurity Analytics for a Smarter Planet

Decision Support in Structural Health Monitoring

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

Disributed Query Processing KGRAM - Search Engine TOP 10

Knowledge based energy management for public buildings through holistic information modeling and 3D visualization. Ing. Antonio Sacchetti TERA SRL

City Data Pipeline. A System for Making Open Data Useful for Cities. stefan.bischof@tuwien.ac.at

Unifi Technology Group & Software Toolbox, Inc. Executive Summary. Building the Infrastructure for emanufacturing

A Service Modeling Approach with Business-Level Reusability and Extensibility

Doctor of Philosophy in Computer Science

Use of a Web-Based GIS for Real-Time Traffic Information Fusion and Presentation over the Internet

IBM ediscovery Identification and Collection

Chapter 1 - Web Server Management and Cluster Topology

SemWeB Semantic Web Browser Improving Browsing Experience with Semantic and Personalized Information and Hyperlinks

IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper

APPLYING CASE BASED REASONING IN AGILE SOFTWARE DEVELOPMENT

Outcomes of the CDS Technical Infrastructure Workshop

Delivering Heterogeneous Hydrologic Data services with an Enterprise Service Bus Application

Using DeployR to Solve the R Integration Problem

Framework for the Development of Environmental Risk Management Services According to the ORCHESTRA Architecture

Building a Database to Predict Customer Needs

A Knowledge Management Framework Using Business Intelligence Solutions

One LAR Course Credits: 3. Page 4

8. KNOWLEDGE BASED SYSTEMS IN MANUFACTURING SIMULATION

Component visualization methods for large legacy software in C/C++

Big Data and Semantic Web in Manufacturing. Nitesh Khilwani, PhD Chief Engineer, Samsung Research Institute Noida, India

PESCDO - A Model For Practical Application

Telecommunication (120 ЕCTS)

Active Automation of the DITSCAP

The Human Element in Cyber Security and Critical Infrastructure Protection: Lessons Learned

A Time Efficient Algorithm for Web Log Analysis

EXPLOITING FOLKSONOMIES AND ONTOLOGIES IN AN E-BUSINESS APPLICATION

M LTO Multilingual On-Line Translation

Development of Enterprise Architecture of PPDR Organisations W. Müller, F. Reinert

White paper: Information Management. Information is a strategic business asset are you managing it as such?

OpenAIRE Research Data Management Briefing paper

Visualizing FRBR. Tanja Merčun 1 Maja Žumer 2

Knowledge-based Expressive Technologies within Cloud Computing Environments

Context Model Based on Ontology in Mobile Cloud Computing

Design and Implementation of an Automatic Semantic Annotation Service

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

The SIEM Evaluator s Guide

Vertical Integration of Enterprise Industrial Systems Utilizing Web Services

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

LinkZoo: A linked data platform for collaborative management of heterogeneous resources

The Purview Solution Integration With Splunk

Transcription:

Service-Based Infrastructure for User-Oriented Environmental Information Delivery Leo Wanner 1,2, Harald Bosch 3, Nadjet Bouayad-Agha 2, Ulrich Bügel 4, Gerard Casamayor 2, Thomas Ertl 3, Ari Karppinen 5, Ioannis Kompatsiaris 6, Tarja Koskentalo 7, Simon Mille 2, Jürgen Moßgraber 4, Anastasia Moumtzidou 6, Maria Myllynen 7, Emanuele Pianta 8, Marco Rospocher 8, Horacio Saggion 2, Luciano Serafini 8, Virpi Tarvainen 5, Sara Tonelli 8, Thomas Usländer 4, Stefanos Vrochidis 6 1 Catalan Institute for Research and Advanced Studies, 2 Dept. of Information and Communication Technologies, Pompeu Fabra University, 3 Visualization Institute, University of Stuttgart, 4 Fraunhofer Institute for Optronics, System Technologies and Image Exploitation, 5 Finnish Meteorological Institute, 6 Informatics and Telematics Institute, Centre for Research and Technology Hellas, 7 Helsinki Region Environmental Services Authority, 8 Fondazione Bruno Kessler pescado@upf.edu; http://www.pescado-project.eu Abstract. Citizens are increasingly aware of the influence of environmental and meteorological conditions on the quality of their life. The consequence of this awareness is the demand for personalized environmental information, i.e., information that is tailored to their specific context and background. The EU-funded project PESCaDO addresses this demand in its full complexity. It aims to develop a service that supports the user in questions related to environmental conditions in that it searches for reliable data in the web, processeses these data to deduce the relevant information and communicates this information to the user in the language of their preference. In this paper, we describe the requirements and the working service-based realization of the infrastructure of the service. Keywords: environmental information service, personalization, infrastructure, decision support 1 Introduction Citizens are increasingly aware of the influence of environmental and meteorological conditions on the quality of their life. One of the consequences of this awareness is the demand for high quality environmental information that is tailored to one s specific context and background, i.e., which is personalized. Personalized environmental information may need to cover a variety of themes (such as meteorology, air quality, pollen, and traffic) and take into account a number of specific personal features (health, age, etc.) of the addressee, as well as the intended use of the information. So far, only a few proposals have been made how this information can be facilitated in technical terms. All of these proposals

2 focus on one theme and only very few of them address the problem of information personalization [Peinel et al., 2000,Karatzas, 2007,Wanner et al., 2010]. PESCaDO (Personalized Environmental Service Configuration and Delivery Orchestration) addresses the above task in its full complexity. It takes advantage of the fact that nowadays, the World Wide Web already hosts a great range of services that address each of the above themes, such that, in principle, the required basic data are available. The challenge thus consists, on the one hand, in the discovery of these services and their orchestration, and, on the other hand, in the processing of the obtained data in accordance with the needs of the addressee and the delivery of the gained information in the mode of preference of the addressee. This challenge requires the involvement of an elevated number of rather heterogeneous applications and thus an infrastructure that is flexible and stable enough to support a potentially distributed architecture. In what follows, we first outline the requirements towards the infrastructure of a platform such as PESCaDO, which attempts to integrate all these applications. Then, we present the working infrastructure that has been designed to meet these requirements. 2 The requirements towards PESCaDO s infrastructure The requirements towards the infrastructure obviously depend on the tasks that are to be addressed. In the case of PESCaDO, the principal tasks are: 1. Discovery of the environmental service nodes in the web: As already pointed out above, the web hosts a large amount of environmental (meteorological, air quality, traffic, pollen, etc.) services, which include both the numerous (static or dynamic) public webpages that offer environmental information worldwide, as well as any dedicated environmental web services with free access. Especially, the number of meteorological services that cover each major location is impressive. In order to be able to offer citizens targeted information, these services must be exploited, which means that these services must be searched for and indexed such that their data can be accessed when needed. 2. Distillation of the data from webpages: The vast majority of the environmental services offer their data via publicly accessible webpages rather than via web services. To access these data, webpage parsing, information extraction, and text mining techniques are needed. Although these techniques can be tuned to the idiosyncratic ways of presenting environmental (i.e., air quality, meteorological, traffic,... ) data and information, the task of webpage scraping remains a very challenging task. 3. Orchestration of the environmental service nodes: Environmental service nodes encountered in the web may require data provided by other service nodes as input data. In order to obtain all necessary data, the environmental service nodes must thus be orchestrated, i.e., selected and chained. This presupposes the selection of appropriate protocols and the use of appropriate data interchange formats. To decide which nodes are to be selected over which other nodes, or which nodes fit best together, such criteria as quality of the individual nodes measured by data uncertainty and service confidence metrics derived using machine learning and visual analytics techniques must be taken into account.

3 4. Fusion of environmental data: Environmental service nodes may provide competing or complementary data on the same or related theme for the same or the neighbour location. To ensure the availability of a most reliable and comprehensive data set as basis for further processing stages, the data from these nodes must be fused. As already in the case of node orchestration, this implies an assessment of the quality of the contributing services and data. 5. Assessment of the data with respect to the needs of the addressee: Once the raw data are obtained, they need to be evaluated and reasoned about in order to infer how they affect the addressee, given his/her personal health and life circumstances and the purpose of the request of the information. Thus, a citizen may request information because he needs it to decide upon a planned action, because he wants to be aware of extreme episodes or because he monitors the environmental conditions in a location. The assessment task obviously presupposes the existence of sufficiently comprehensive domain-specific ontologies and a knowledge base. 6. Selection of user-relevant content and its delivery to the addressee: Not all content deduced from the data by inferences and reasoning is apt to be communicated to the addressee: some of this content would sound trivial or irrelevant. Intelligent content selection strategies that take into account the background of the addressee and the intended use of the information are thus needed to decide which elements of the content are worth and meaningful to be communicated. To deliver the selected content, techniques are required that present the content in a suitable mode (text, graphic and/or table)in the language of the preference of the addressee. 7. Interaction with the user: One should not forget the interaction of the system with the user. The user must be able to formulate his problem in a simple and intuitive format be it based on natural language or on graphical building blocks. He should equally be delivered the generated information in a suitable form and, as already mentioned above, in the language of his preference. We are aware of the complexity of each of these tasks. However, given the expertise and the experience of the partners of the PESCaDO Consortium in the corresponding research areas, we are confident to be able to offer an operational PESCaDO service at the end of the lifetime of the project. 3 The PESCaDO infrastructure In order to meet the above requirements, PESCaDO has opted for a servicebased architecture. This architecture is based on a methodology which has been developed for the definition of an open architecture for risk management as provided in the EU FP6 IP project ORCHESTRA [Usländer, 2007] and which has been extended in the FP6 IP project SANY [SANY, 2009] to cover the domain of sensor networks and standard-based sensor web enablement. The focus of this methodology is on a platform neutral specification. In other words, it aims to provide the basic concepts and their interrelationships (conceptual models) as abstract specifications. The design is guided by the methodology developed in the

4 ISO/IEC Reference Model for Open Distributed Processing (RM-ODP), which explicitely foresees an engineering step that maps solution types, such as information models, services and interfaces specified in information and service viewpoints, respectively, to distributed system technologies. This section illustrates the outcome of this engineering step for the service viewpoint in PESCaDO. Application specific major tasks and actions have been defined as abstract service specifications and can be implemented as service instances on a specific plattform. Web service instances for these services are currently beeing developed. They can be redefinied and substituted as needed in the course of the project. Figure 1 displays a sample, somewhat simplified, workflow with the major application services in action. Two services are not cited in Figure 1 since they are consulted by nearly all other services: the Knowledge Base Access Service and the User Profile Management Service. Furthermore, the figure does not include the services related to data and information distillation from webpages. Fig. 1. Sequence diagram for the execution of the services in PESCaDO

5 A main dispatcher service (called Answer Service, AS) controls the workflow and the execution of the services. The user interacts with PESCaDO via the User Interaction Service (UIS). If unsure about the types of information he can ask for, the user can inquire this information by requesting it from the Problem Description Service (PDS). To ensure a full comprehension of the problem or question of the user, PESCaDO decided to operate with controlled graphical and natural language input formats. Once the user has decided what kind of question he wants to submit to the system, the UIS provides the user the corresponding formats. Thereupon, the user can formulate his query, which is translated by the PDS into a formal ontology-based representation understood by the system. Once this is done, the problem description is passed by the UIS to the AS as a Request Answer inquiry. Then, the AS assesses what kinds of data beyond environmental data are needed to answer the query of the user and solicit these data from the Auxiliary Services (AuxS). For instance, if the user s query concerns the environmental conditions for a bicycle tour from A to B, the route from A to B must be calculated by a Route Calculation Service. With the complementary data at hand, the AS can request from the Data Retrieval Service (DRS) the environmental data needed to answer the user query. The DRS solicits these data from the environmental nodes that identified by the Data Node Retrieval Service (DNRS) as relevant to the user s query and the complementary data. To speed up retrieval, an off-line indexing is performed. During the indexing procedure, a domain specific search engine accesses the web, discovers, and indexes the environmental service nodes in a local repository. In addition, the retrieved webpages are processed in order to extract environmentally relevant information (e.g. location, environmental measurements, etc.) with the aid of document parsing, web scraping and content distillery techniques, so that each service can be indexed according to this information. As already pointed out in Section 2, the retrieved nodes may deliver complementary or competing data of varying quality. 1 The Fusion Service (FS) applies uncertainty metrics to obtain the optimal and maximally complete data set, which is passed by the AS to the Decision Service (DS). The DS converts the data set into knowledge, or content, in that it relates it to the knowledge in PESCaDO s knowledge base, reasons about it, and assesses it from the perspective of its relevance to the user. From this content, the Content Selection Service (CSS) compiles a content plan, which contains the knowledge to be communicated to the user as answer. The Information Production Service (IPS) takes the content plan as input and generates information in the language and mode (text, table, or graphic) of the preference of the user, which then is passed to the user. 1 For simplicity, we dispense with the illustration of the chaining of service nodes.

6 4 PESCaDO as part of ICT for Environmental Services The service-based infrastructure as illustrated above in a sample workflow allows for a maximally flexible realization of the PESCaDO platform: each service (and thus each module of the platform) can be implemented nearly entirely independent from the other services and be run either on a separate machine or on the same machine as the other services. As a matter of fact, many of the services could be used as plug-in modules by other environmental application platforms. How this can be achieved best, needs to be discussed. In any case, a standardization of the communication protocols across the initiatives seems highly desirable. 5 PESCaDO and its consortium Running from January 2010 to December 2012, PESCaDO is partially funded by the European Commission in its 7th Framework Programme under the contract number ICT-259486. PESCaDO s consortium consists of seven partners: 1. Pompeu Fabra University (UPF), 2. Fraunhofer Institute for Optronics, System Technologies and Image Exploitation (IOSB), 3. Finnish Meteorological Institute (FMI), 4. University of Stuttgart (USTUTT), 5. Foundation Bruno Kessler (FBK), 6. Centre for Research and Technology Hellas (ITI-CERTH), and 7. Helsinki Region Environmental Services Authority (HSY). The information technologies aspects of the project are covered by CERTH (web-based search), FBK (semantic representation, reasoning strategies, content distillation), UPF (multilingual information generation and human-computer interaction), and USTUTT (visualization and human-computer interaction). The architectural and infrastructure issues are addressed by IOSB. Problems related to uncertainty and confidence metrics of environmental data and information are dealt with by FMI, which, together with HSY also provides its environmental expertise and assumes the validation of the outcome of PESCaDO. References [Usländer, 2007] Usländer, T. (ed.), 2007. Reference Model for the ORCHES- TRA Architecture Version 2.1. OGC Best Practices Document 07-097, http://portal.opengeospatial.org/files/?artifact id=23286. [Karatzas, 2007] Karatzas, K. State-of-the-art in the dissemination of AQ information to the general public. Proceedings of EnviroInfo, Vol. 2. 41 47. Warsaw, 2007. [Peinel et al., 2000] Peinel, G., T. Rose and R. San José. 2000. Customized Information Services for Environmental Awareness in Urban Areas. Proceedings of the 7th World Congress on Intel ligent Transport Systems. Turin, 2000. [SANY, 2009] SANY SensorSA (Sensor Service Architecture of the project SANY). Public OGC Discussion Paper OGC 09-132r1. http://portal.opengeospatial.org/files/?artifact id=35888&version=1. [Wanner et al., 2010] Wanner, L., B. Bohnet, N. Bouayad-Agha, F. Lareau and D. Nicklaß: MARQUIS: Generation of User-Tailored Multilingual Air Quality Bulletins. Applied Artificial Intelligence, 24(10), 2010.