Simplifying Big Data Integration A Software as a Service Approach ~ Preliminary Analysis and Design ~



Similar documents
Configuring, Monitoring and Deploying a Private Cloud with System Center 2012 Boot Camp

Business Intelligence represents a fundamental shift in the purpose, objective and use of information

Professional Leaders/Specialists

Copernicus & Big Data: A Perspective from the European EO Services Industry. Geoff Sawyer: EARSC Secretary General

OFFICIAL JOB SPECIFICATION. Network Services Analyst. Network Services Team Manager

OR 2) Implement and customize an off the shelf product that would suit the requirements

AMERITAS INFORMATION TECHNOLOGY DISASTER RECOVERY AND DATA CENTER STRATEGY

Restricted Document. Pulsant Technical Specification

1)What hardware is available for installing/configuring MOSS 2010?

Feature Guide. Virto Commerce Platform

TCO USE OF SOFIA2 AS BACKEND VS CUSTOM DEVELOPMENT ON A RELATIONAL DATABASE

Data Abstraction Best Practices with Cisco Data Virtualization

Change Management Process

ALM in the Cloud an Overview of Oracle Developer Cloud Service. Introduction. By Dana Singleterry

Securely Managing Cryptographic Keys used within a Cloud Environment

Datasheet. PV4E Management Software Features

Innovate faster with a cloud-enabled enterprise. Dirk Basenach, SAP SE, HANA Cloud Platform November 2 nd, 2015

How Does Cloud Computing Work?

Level 1 Technical. RealPresence Web Suite and Web Suite Pro. Contents

TOWARDS OF AN INFORMATION SERVICE TO EDUCATIONAL LEADERSHIPS: BUSINESS INTELLIGENCE AS ANALYTICAL ENGINE OF SERVICE

Business Intelligence and DataWarehouse workshop

G-CLOUD FRAMEWORK SERVICE DEFINITION. Solution Architecture for Cloud Service. Copyright: point6 Ltd

BRISTOL CITY COUNCIL ROLE AND EMPLOYEE PROFILE: Architect (Practitioner Level) Specific Role Data Architect

Mobilizing Healthcare Staff with Cloud Services

Equivio Zoom. The e-discovery platform for predictive coding and analytics

Integrating With incontact dbprovider & Screen Pops

Build the cloud OpenStack Installation & Configuration Integration with existing tools and processes Cloud Migration

JADU UNIVERSE SPECIALIST CLOUD SERVICES: DEVELOPMENT

Information Services Hosting Arrangements

Service Level Agreement in IBM T Clud - ITAP

Case Study. Sonata develops. comprehensive BI Application for a leading provider of Animal Nutrition Solutions. Ananthakrishnan

The Importance Advanced Data Collection System Maintenance. Berry Drijsen Global Service Business Manager. knowledge to shape your future

Bitrix Intranet. Product Requirements

INFRASTRUCTURE TECHNICAL LEAD

South Australia Police POSITION INFORMATION DOCUMENT

Towards Novel Certification Models in Cloud Infrastructures (the CUMULUS approach)

CASSOWARY COAST REGIONAL COUNCIL POLICY ENTERPRISE RISK MANAGEMENT

UC4 AUTOMATED VIRTUALIZATION Intelligent Service Automation for Physical and Virtual Environments

Licensing the Core Client Access License (CAL) Suite and Enterprise CAL Suite

Oakland County Department of Information Technology Project Scope and Approach

Improved Data Center Power Consumption and Streamlining Management in Windows Server 2008 R2 with SP1

VMware vcloud Architecture Toolkit Operating a VMware vcloud

State of Wisconsin. File Server Service Service Offering Definition

Customizing Microsoft Dynamics CRM for Complex Field Service and Sales Organizations

Process Automation With VMware

Interworks Cloud Platform Citrix CPSM Integration Specification

G-CLOUD FRAMEWORK SERVICE DEFINITION. Oracle Technology Service for Agile Cloud Projects. Copyright: point6 Ltd

How To Manage Aio Cms

Delivering Business Value Through IT Cost Transparency Using IT CMF

ITIL Release Control & Validation (RCV) Certification Program - 5 Days

How To Write A Secure Cloud Computing For Critical Infrastructure

MANITOBA SECURITIES COMMISSION STRATEGIC PLAN

White Paper for Mobile Workforce Management and Monitoring Copyright 2014 by Patrol-IT Inc.

The AppSec How-To: Choosing a SAST Tool

GUIDELINE INFORMATION MANAGEMENT (IM) PROGRAM PLAN

Helpdesk Services at the Executive Office of Energy and Environmental Affairs is defined as follows:

Volume 2, Issue 11, November 2014 International Journal of Advance Research in Computer Science and Management Studies

VACANCY. SENIOR MANAGER: SPECIAL PROJECTS AND STAKEHOLDER MANAGEMENT x1 3 YEAR CONTRACT (WITH A POSSIBILITY OF BEING EXTENDED TO 5 YEARS) JOB LEVEL: 5

ITIL Service Offerings & Agreement (SOA) Certification Program - 5 Days

What is Software Risk Management? (And why should I care?)

Zimbra Professional Services Portfolio, Purchasing Guide & Price List

Aim The aim of a communication plan states the overall goal of the communication effort.

Team Process Data Warehouse Goals and High-Level Requirements

This report provides Members with an update on of the financial performance of the Corporation s managed IS service contract with Agilisys Ltd.

Real life experience implementing monitoring & services management Andreas Tsangaris, CTO, PERFORMANCE

Transcription:

The BigIaS Platfrm Simplifying Big Data Integratin A Sftware as a Service Apprach ~ Preliminary Analysis and Design ~ September 5 th, 2013 Sgndal, Nrway Dumitru Rman Claudia Daniela Pp Rxana Iana Rman Bjørn Magnus Mathisen Cntact: dumitru.rman@sintef.n 1

Cntext: Big Data and Our Primary Fcus Addresses things that can be dne at a large scale but cannt be dne at a smaller ne Extract new insights r create new frms f value in ways that change stakehlders and relatinships between them Causality vs. crrelatins: nt knwing why but nly what Challenges the basic traditinal understanding f hw t make decisins Hetergeneity Change 2

Overview The prblem Data integratin - a cmplex, unslved prblem Tls addressing varius aspects f data integratin prcess can hardly be used tgether fr mre cmplex, interesting integratin tasks => High cst f data integratin at large scale, rather cmplicated and time cnsuming prcess The gal: Simplify data integratin at large scale! Enable users with limited technical data integratin skills t get frm raw data t insightful data with minimal effrt The apprach Semantic-based data integratin Flexible and custmizable wrkflws f data integratin tls (applicatin integratin) => A Sftware-as-a-Service fr data integratin at large scale 3

The BigIaS Platfrm UX Prtal and widgets Platfrm Layer Data peratins Entity extractin Data crawling Data linking Visualizatin Streaming Talend Data Integratin C SPARQL Karma Grk/ NUPIC Open Refine Trident ML Tabula UIMA/ Gate Talend ESB Silk Map4Rdf LdSpider LdLive Other Data Layer Strage services Streaming services Sesame OWLIM Streaming Data Sets 3 rd party 3 rd party 3 rd party Open Data LOD 4

Evaluated tls/appraches 1. Applicatin Integratin Talend ESB 2. Data Prcessing Talend Data Integratin Tabula Karma Open Refine UIMA GATE Silk LdSpider 3. Strage Sesame OWLIM 4. Visualizatin Map4Rdf LdLive 5. Real-time Machine Learning Trident ML Grk/NUPIC 5. Streaming C-SPARQL 5

Prpsed Integratin Wrkflws 1. Data Cntextualizatin 2. Entity Discvery 3. Data Linking 4. Data Visualizatin 5. Real-time Machine Learning 6. RDF Streaming 6

Data Cntextualizatin *OWL Excel, CSV, XML, etc. Integratin with external data (Karma, Open Refine) PDF Excel Relatinal DB X2CSV (Tabula PDF, Open Refine, Karma) Exprt RDF (Karma, Open Refine) Open Refine: files: TSV, CSV, *SV, Excel (.xls and.xlsx), JSON, XML, RDF as XML and Ggle Dcs Web Services Karma: databases: MySQL, SQL Server, ORACLE, PstGIS files: Excel, CSV, XML, JSON, KML Web Services Tabula: PDF Data Cleaning (Open refine) Stre RDF (Sesame + OWLIM) SPARQL endpint * OWL file can be imprted r cnfigured 7

Entity Discvery *OWL Exprt RDF Stre RDF SPARQL endpint (Karma, Open Refine) (Sesame + OWLIM) CSV Excel Entity Extractin (UIMA, Gate) Recncile against: Gate: files: text files UIMA: files: text, audi, vide Entity Cntextualizatin Open Refine (Recncile Plug In) Sparql Endpint RDF file Sindice service (Twitter, Freebase, DBpedia, and ther datasets, relevant fr the data) * OWL file can be imprted r cnfigured 8

Data Linking SPARQL endpint CSV Excel Relatinal DB X2RDF (Open Refine, Karma) Stre RDF (Sesame + OWLIM) Link Discvery (SILK) Open Refine: files: TSV, CSV, *SV, Excel (.xls and.xlsx), JSON, XML, RDF as XML and Ggle Dcs Web Services Karma: databases: MySQL, SQL Server, ORACLE, PstGIS files: Excel, CSV, XML, JSON, KML Web Services Stre RDF (Sesame + OWLIM) SPARQL endpint 9

Data Visualizatin Data Cntextualizatin Sparql Endpint CSV Excel Relatinal DB Entity Discvery Data Visualizatin (Map4RDF, LdLive) GUI Data Linking 10

Real time Machine Learning Data Cntextualizatin Sparql Endpint CSV Excel Relatinal DB Entity Discvery Data Linking Real time Machine Learning (GROK/NUPIC, Trident ML) Patterns C Sparql Query Generatr C Sparql queries Streaming Service (DB, e.g. JSON) Real Time RDFizatin Service 11

RDF Streaming C Sparql queries Real Time RDFizatin Service Streaming Service (DB, e.g. JSON) Real time Machine Learning (GROK/NUPIC, Trident ML) Streaming Engine (C Sparql) RDF 12

Cre Technlgical Prerequisites Tl Open Refine Karma UIMA Silk C Sparql Prerequisites GREL Functins GREL Examples Create ntlgies Ontlgies frm Karma Web Interface Regular Expressins Silk Link Specificatin Language (Silk LSL) C Sparql language 13

Relevant upcming research prjects (currently under negtiatins) Data Publishing thrugh the Clud: A Data- and Platfrm-as-a- Service Apprach fr Efficient Data Publicatin and Cnsumptin (DaPaaS) The DaPaaS prject aims t deliver an integrated DaaS and PaaS envirnment fr pen data the DaPaaS platfrm tgether with supprting activities fr effective and efficient publicatin and cnsumptin f data and creatin f applicatins using the data. Expected t start Nv 2013 Budget ~2.1M (~1.5M ) fr 2 years (EC funded) 14

15

Relevant upcming research prjects (currently under negtiatins cnt ) PraSense The Practive Sensing Enterprise The gal is t prvide a very scalable, distributed architecture fr the management and prcessing f big-data that will enable cntinuus mnitring f the need fr the service adaptatin and prpse crrespnding changes in an (semi-) autmatic way. Expected t start Nv 2013 Budget ~4.2M (~3.2M ) fr 3 years (EC funded) 16

17

Relevant upcming research prjects (currently under negtiatins cnt ) SmartOpenData Open Linked Data fr envirnment prtectin in Smart Regins SmartOpenData aims t define mechanisms fr acquiring, adapting and using Open Data prvided by existing surces fr envirnment prtectin in Eurpean prtected areas Expected t start Nv 2013 Budget ~3.4M (~2.5M ) fr 2 years (EC funded) 18

Relevant upcming research prjects (currently under negtiatins cnt ) INFRARISK Nvel Indicatrs fr identifying critical INFRAstructure at RISK frm natural Hazards Develp reliable stress tests n Eurpean critical infrastructure using integrated mdelling tls fr decisin-supprt. It will lead t higher infrastructure netwrks resilience t rare and lw prbability extreme events, knwn as black swans. Expected t start Oct 2013 Budget ~3.6M (~2.8M ) fr 2 years (EC funded) 19

Relevant nging research prjects BigFut Analyzing Big Data: Preparing fr the Future f Intelligent Infrmatin Management SINTEF Internal prject Gals: Analyze and integrate a suit f technlgical appraches and techniques Advance dem/prttype implementatin t penetrate the market in the shrt term Jan-Dec 2013 20

Relevant nging research prjects (cnt ) CITI-SENSE develp Citizen s Observatries t empwer citizens t: Cntribute t and participate in envirnmental gvernance Supprt and influence cmmunity and plicy pririties and assciated decisin making Cntribute t Glbal Earth Observatin System f Systems (GEOSS) 27 partners (EC fuded) http://www.citi sense.eu/ 21

Summary and Outlk What s new here The platfrm itself, implementing flexible data integratin wrkflws A set f cmpnents (e.g. Real-time RDFizatin f streams, C-Sparql Query Generatr) What s challenging Applicatin integratin Cnsistent scalability thrughut wrkflws Platfrm deplyment n clud envirnments Use f new, unprven technlgies and many thers Shrt-term plan (end f September) Get the first prttype implementing the prpsed wrkflws Experiment with sme data / simple use cases (e.g. CITI-SENSE data) 22

Thank yu! Q&A 23