MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management

Similar documents
LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model

Linked Statistical Data Analysis

Publishing Linked Data Requires More than Just Using a Tool

Gradient An EII Solution From Infosys

PONTE Presentation CETIC. EU Open Day, Cambridge, 31/01/2012. Philippe Massonet

Network Graph Databases, RDF, SPARQL, and SNA

We have big data, but we need big knowledge

Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study

E6895 Advanced Big Data Analytics Lecture 4:! Data Store

Comparison of Triple Stores

Preparing Your Data For Cloud

Taming Big Data Variety with Semantic Graph Databases. Evren Sirin CTO Complexible

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

Data Grids. Lidan Wang April 5, 2007

Data Integration Checklist

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Tier Architectures. Kathleen Durant CS 3200

Semantic Interoperability

SQL Server 2012 Business Intelligence Boot Camp

A generic approach for data integration using RDF, OWL and XML

JOURNAL OF COMPUTER SCIENCE AND ENGINEERING

FROM RELATIONAL TO OBJECT DATABASE MANAGEMENT SYSTEMS

Efficient SPARQL-to-SQL Translation using R2RML to manage

Department of Defense. Enterprise Information Warehouse/Web (EIW) Using standards to Federate and Integrate Domains at DOD

GeoKettle: A powerful open source spatial ETL tool

Ganzheitliches Datenmanagement

De la Business Intelligence aux Big Data. Marie- Aude AUFAURE Head of the Business Intelligence team Ecole Centrale Paris. 22/01/14 Séminaire Big Data

Cloud3DView: Gamifying Data Center Management

The University of Jordan

Importance of Data Abstraction, Data Virtualization, and Data Services Page 1

Building Semantic Content Management Framework

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

A Grid Architecture for Manufacturing Database System

Data Quality in Information Integration and Business Intelligence

DBpedia German: Extensions and Applications

Linked Data Interface, Semantics and a T-Box Triple Store for Microsoft SharePoint

IMAN: DATA INTEGRATION MADE SIMPLE YOUR SOLUTION FOR SEAMLESS, AGILE DATA INTEGRATION IMAN TECHNICAL SHEET

The Ontological Approach for SIEM Data Repository

UIMA and WebContent: Complementary Frameworks for Building Semantic Web Applications

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

A Multidatabase System as 4-Tiered Client-Server Distributed Heterogeneous Database System

Zhenping Liu *, Yao Liang * Virginia Polytechnic Institute and State University. Xu Liang ** University of California, Berkeley

Lecture 2: Storing and querying RDF data

Federated Query Processing over Linked Data

Efficient Data Access and Data Integration Using Information Objects Mica J. Block

Ontology based ranking of documents using Graph Databases: a Big Data Approach

UltraQuest Cloud Server. White Paper Version 1.0

HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM. Aniket Bochare - aniketb1@umbc.edu. CMSC Presentation

THE OPEN UNIVERSITY OF TANZANIA FACULTY OF SCIENCE TECHNOLOGY AND ENVIRONMENTAL STUDIES BACHELOR OF SIENCE IN DATA MANAGEMENT

A Framework for Developing the Web-based Data Integration Tool for Web-Oriented Data Warehousing

Creating an RDF Graph from a Relational Database Using SPARQL

Language Interface for an XML. Constructing a Generic Natural. Database. Rohit Paravastu

Secure and Semantic Web of Automation

Core Enterprise Services, SOA, and Semantic Technologies: Supporting Semantic Interoperability

Semantic Web Technologies and Data Management

esanté et quelques projets Henning Müller

An Ontology Based Method to Solve Query Identifier Heterogeneity in Post- Genomic Clinical Trials

Outline. Mariposa: A wide-area distributed database. Outline. Motivation. Outline. (wrong) Assumptions in Distributed DBMS

Business Intelligence, Analytics & Reporting: Glossary of Terms

CHAPTER 6 EXTRACTION OF METHOD SIGNATURES FROM UML CLASS DIAGRAM

A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS

Information integration platform for CIMS. Chan, FTS; Zhang, J; Lau, HCW; Ning, A

Simplifying e Business Collaboration by providing a Semantic Mapping Platform

Outlines. Business Intelligence. What Is Business Intelligence? Data mining life cycle

BUILDING OLAP TOOLS OVER LARGE DATABASES

CSCI-UA: Database Design & Web Implementation. Professor Evan Sandhaus sandhaus@cs.nyu.edu evan@nytimes.com

JBoss Enterprise Data Services Platform in the Enterprise

DATA INTEGRATION CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Database preservation toolkit:

REST vs. SOAP: Making the Right Architectural Decision

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Integrating Open Sources and Relational Data with SPARQL

Introduction to Ontologies

Architecting for the Internet of Things & Big Data

Real-time Streaming Analysis for Hadoop and Flume. Aaron Kimball odiago, inc. OSCON Data 2011

MOBILE ARCHITECTURE FOR DYNAMIC GENERATION AND SCALABLE DISTRIBUTION OF SENSOR-BASED APPLICATIONS

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

A collaborative platform for knowledge management

Lift your data hands on session

Workflow/Business Process Management

MySQL for Beginners Ed 3

MDM and Data Warehousing Complement Each Other

Perl & NoSQL Focus on MongoDB. Jean-Marie Gouarné jmgdoc@cpan.org

Salesforce.com to SAP Integration

Transcription:

MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database System in Energy Data Management Zhan Liu, Fabian Cretton, Anne Le Calvé, Nicole Glassey, Alexandre Cotting, Fabrice Chapuis Institute of Business Information Systems University of Applied Sciences and Arts Western Switzerland HES-SO Valais, Sierre, Switzerland

OVERVIEW Who Are We? Introduction Objectives Research methodologies Architecture of Heterogeneous Distributed Database System Implementation and Use Cases Conclusion and Future Research

WHO ARE WE?

WHO ARE WE? 72 staff members 20 Professors 14 Research scientists 29 Research assistants 9 Administrative collaborators and interns 92 Scientific publications 291 Projects (2013) CHF 8,85 millions (Turnover in 2013)

CORE COMPETENCIES eenergy egov ehealth ERP eservices BPM Cloud Computing Data Intelligence Ergonomy, Usability Intelligent Agents Internet of Things Medical Information Processing Semantic Web Software Engineering

INTRODUCTION Databases in the context of an energy database management system (EDMS) are non-integrated, distributed and heterogeneous As a result, information integration is becoming increasing important BUT, It consumes a great deal of time and money

OBJECTIVES To propose a uniform approach by using Semantic Web technologies for SPARQL querying a heterogeneous distributed database system named MUSYOP for energy data management To provide transparent query access to multiple heterogeneous data sources Query optimization to speed-up search executions

RESEARCH METHODOLOGIES (1) Warehouse approach (centralized system) High efficiency and the capability of extracting deeper information for decision making Must set up data cleansing and data standardization before actual use Could take months of planning (2) Federated approach (decentralized system) High adaptability to frequent changes of data sources support of large numbers of data sources and data sources with high heterogeneity MUSYOP focused on the federated approach

ARCHITECTURE OF HETEROGENEOUS DISTRIBUTED DATABASE SYSTEM

COMPONENTS OF ARCHITECTURE (1) Distributed Query Decomposer Generates a number of transaction to match remote data sources Assembles transaction results and returns to user Query Optimizer Data source optimization: data source selection and building sub-queries Join order optimization: determine the numbers of intermediate return results Distributed Transaction Coordinator Detects and handles persistent records and manages the communications with databases Parsed query Query Parser 2 Distributed Query Decomposer Distributed transactions 3 Query Optimizer Optimal sub-queries 4 Distributed Transaction Coordinator

COMPONENTS OF ARCHITECTURE (2) D2RQ Query relational database using SPARQL Custom D2RQ mapping file for describing the relation between an ontology and an relational data model SPARQL Endpoint Query the knowledge bases, such as TripleStore via SPARQL protocole services SPARQL2XQUERY Framework provides a generic method for SPARQL to XQuery translation ALLEGROGRAPH Query MongoDB databases using SPARQL Manual transform datasets from MongoDB to TripleStore SPARQLverse D2RQ SPARQL ENDPOINT SPARQL2XQUERY ALLEGROGRAPH

ARCHITECTURE IMPLEMENTATION Aggregation Distributed Database Server Distributed Database Server Distributed Database Server Distributed Database Server D2RQ Local SPARQL Query Convertor ENDPOINT Local Query SPARQL2XQUERY Convertor Local Query ALLEGROGRAPH Convertor Local Local Local Local 6 7 Result 6 7 Result 6 7 Result 6 7 query query query query Local Schema Local Schema Local Schema Local Schema Local DBMS Schema Local DBMS Schema Local DBMS Schema Network Network Network DBMS DBMS DBMS DBMS Result Relational DB MYSQL MYSQL MYSQL Triple Store XML NOSQL Daily, monthly and yearly consumption data and user alert management data Building information, cities location information and weather information User s notification settings Matching Frequent data of consumption Link

ENERGY USE CASE

SPARQL QUERY PREFIX db: <http://localhost:2020/resource/> PREFIX vocab: <http://localhost:2020/resource/vocab/> prefix intbatxtra: <http://websemantique.ch/onto/intbatxtra#> prefix musyoponto: <http://www.websematique.ch/voc/musyop#> prefix dbp-ont: <http://dbpedia.org/ontology/> prefix xsd: <http://www.w3.org/2001/xmlschema#> select?userid?buildingid?buildinglabel?cpid?cons1?cons2?consdiff?city?elevation { SERVICE <http://localhost:2020/sparql> {?config_port a vocab:config_ports; vocab:config_ports_affectation "Gaz"; vocab:config_ports_batiment_id?buildingid ; vocab:config_ports_config_port_id?cpid.?donneemens2011 vocab:donnees_mensuelles_config_port?config_port; vocab:donnees_mensuelles_timestamp "2011-01- 01T00:00:00"^^xsd:dateTime; vocab:donnees_mensuelles_cons?cons1.?donneemens2012 vocab:donnees_mensuelles_config_port?config_port; vocab:donnees_mensuelles_timestamp "2012-01- 01T00:00:00"^^xsd:dateTime; vocab:donnees_mensuelles_cons?cons2.?buildinguser rdf:type vocab:building_user; vocab:building_user_building_id?buildingid; vocab:building_user_utilisateur_id?userid; }?build a intbatxtra:object; musyoponto:hasid?buildingid ; rdfs:label?buildinglabel; musyoponto:locatedin?city.?city dbp-ont:elevation?elevation. Filter(?elevation <?paramelevationmax) BIND((xsd:double(?cons2)/xsd:double(?cons1)) as?consdiff). Filter(?consDiff >?paramconsdiffmax) } ORDER BY?buildingID?userID limit 100 VALUES (?userid) { USERID_VALUES }

GRAPHICAL USER INTERFACES

GRAPHICAL USER INTERFACES

GRAPHICAL USER INTERFACES

CONCLUSION AND FUTURE RESEARCH Presented an architecture of heterogeneous distributed database system for energy data management Integration of a mediator server to access and coordinate with four different databases: relational database, NOSQL, Triple Store and XML Proposed an approach for query optimization based on our architecture Future work required on the evaluation our system with a large amount of energy data in Switzerland